Welcome to the fourteenth post in the rationally rambling R rants series, or R4 for short. The last two posts were concerned with faster installation. First, we showed how ccache can speed up (re-)installation. This was followed by a second post on faster installation via binaries.
This last post immediately sparked some follow-up. Replying to my tweet about it, David Smith wondered how to combine binary and source installation (tl;dr: it is hard as you need to combine two package managers). Just this week, Max Ogden wondered how to install CRAN packages as binaries on Linux, and Daniel Nuest poked me on GitHub as part of his excellent containerit project as installation of binaries would of course also make Docker container builds much faster. (tl;dr: Oh yes, see below!)
So can one? Sure. We have a tool. But first the basics.
Packages for a particular distribution are indexed by a packages file for that distribution. This is not unlike CRAN using top-level PACKAGES*
files. So in principle you could just fetch those packages files, parse and index them, and then search them. In practice that is a lot of work as Debian and Ubuntu now have several tens of thousands of packages.
So it is better to use the distro tool. In my use case on .deb
-based distros, this is apt-cache
. Here is a quick example for the (Ubuntu 17.04) server on which I type this:
$ sudo apt-get update -qq ## suppress stdout display
$ apt-cache search r-cran- | wc -l
419
$
So a very vanilla Ubuntu installation has "merely" 400+ binary CRAN packages. Nothing to write home about (yet) -- but read on.
A decade ago, I was involved in two projects to turn all of CRAN into .deb binaries. We had a first ad-hoc predecessor project, and then (much better) a 'version 2' thanks to the excellent Google Summer of Code work by Charles Blundell (and mentored by me). I ran with that for a while and carried at the peak about 2500 binaries or so. And then my controlling db died, just as I visited CRAN to show it off. Very sad. Don Armstrong ran with the code and rebuilt it on better foundations and had for quite some time all of CRAN and BioC built (peaking at maybe 7k package). Then his RAID died. The surviving effort is the one by Michael Rutter who always leaned on the Lauchpad PPA system to build his packages. And those still exist and provide a core of over 10k packages (but across different Ubuntu flavours, see below).
In order to access c2d4u you need an Ubuntu system. For example my Travis runner script does
# Add marutter's c2d4u repository, (and rrutter for CRAN builds too)
sudo add-apt-repository -y "ppa:marutter/rrutter"
sudo add-apt-repository -y "ppa:marutter/c2d4u"
After that one can query apt-cache
as above, but take advantage of a much larger pool with over 3500 packages (see below). The add-apt-repository
command does the Right Thing (TM) in terms of both getting the archive key, and adding the apt
source entry to the config directory.
Now, all this command-line business is nice. But can we do all this programmatically from R? Sort of.
The RcppAPT package interface the libapt library, and provides access to a few functions. I used this feature when I argued (unsuccessfully, as it turned out) for a particular issue concerning Debian and R upgrades. But that is water under the bridge now, and the main point is that "yes we can".
Building on RcppAPT, within the Rocker Project we built on top of this by proving a particular class of containers for different Ubuntu releases which all contain i) RcppAPT and ii) the required apt
source entry for Michael's repos.
So now we can do this
$ docker run --rm -ti rocker/r-apt:xenial /bin/bash -c 'apt-get update -qq; apt-cache search r-cran- | wc -l'
3525
$
This fires up the corresponding Docker container for the xenial
(ie 16.04 LTS) release, updates the apt
indices and then searches for r-cran-*
packages. And it seems we have a little over 3500 packages. Not bad at all (especially once you realize that this skews strongly towards the more popular packages).
A little while a ago a seemingly very frustrated user came to Carl and myself and claimed that out Rocker Project sucketh because building rstan
was all but impossible. I don't have the time, space or inclination to go into details, but he was just plain wrong. You do need to know a little about C++, package building, and more to do this from scratch. Plus, there was a long-standing issue with rstan and newer Boost (which also included several workarounds).
Be that as it may, it serves as nice example here. So the first question: is rstan
packaged?
$ docker run --rm -ti rocker/r-apt:xenial /bin/bash -c 'apt-get update -qq; apt-cache show r-cran-rstan'
Package: r-cran-rstan
Source: rstan
Priority: optional
Section: gnu-r
Installed-Size: 5110
Maintainer: cran2deb4ubuntu <cran2deb4ubuntu@gmail.com>
Architecture: amd64
Version: 2.16.2-1cran1ppa0
Depends: pandoc, r-base-core, r-cran-ggplot2, r-cran-stanheaders, r-cran-inline, r-cran-gridextra, r-cran-rcpp,\
r-cran-rcppeigen, r-cran-bh, libc6 (>= 2.14), libgcc1 (>= 1:4.0), libstdc++6 (>= 5.2)
Filename: pool/main/r/rstan/r-cran-rstan_2.16.2-1cran1ppa0_amd64.deb
Size: 1481562
MD5sum: 60fe7cfc3e8813a822e477df24b37ccf
SHA1: 75bbab1a4193a5731ed105842725768587b4ec22
SHA256: 08816ea0e62b93511a43850c315880628419f2b817a83f92d8a28f5beb871fe2
Description: GNU R package "R Interface to Stan"
Description-md5: c9fc74a96bfde57f97f9d7c16a218fe5
$
It would seem so. With that, the following very minimal Dockerfile is all we need:
## Emacs, make this -*- mode: sh; -*-
## Start from xenial
FROM rocker/r-apt:xenial
## This handle reaches Carl and Dirk
MAINTAINER "Carl Boettiger and Dirk Eddelbuettel" rocker-maintainers@eddelbuettel.com
## Update and install rstan
RUN apt-get update && apt-get install -y --no-install-recommends r-cran-rstan
## Make R the default
CMD ["R"]
In essence, it executes one command: install rstan
but from binary taking care of all dependencies. And lo and behold, it works as advertised:
$ docker run --rm -ti rocker/rstan:local Rscript -e 'library(rstan)'
Loading required package: ggplot2
Loading required package: StanHeaders
rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
$
So there: installing from binary works, takes care of dependencies, is easy and as an added bonus even faster. What's not too like?
(And yes, a few of us are working on a system to have more packages available as binaries, but it may take another moment...)
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.