Welcome to the fourteenth post in the rationally rambling R rants series, or R4 for short. The last two posts were concerned with faster installation. First, we showed how ccache can speed up (re-)installation. This was followed by a second post on faster installation via binaries.
This last post immediately sparked some follow-up. Replying to my tweet about it, David Smith wondered how to combine binary and source installation (tl;dr: it is hard as you need to combine two package managers). Just this week, Max Ogden wondered how to install CRAN packages as binaries on Linux, and Daniel Nuest poked me on GitHub as part of his excellent containerit project as installation of binaries would of course also make Docker container builds much faster. (tl;dr: Oh yes, see below!)
So can one? Sure. We have a tool. But first the basics.
Packages for a particular distribution are indexed by a packages file for that distribution. This is not unlike CRAN using top-level
PACKAGES* files. So in principle you could just fetch those packages files, parse and index them, and then search them. In practice that is a lot of work as Debian and Ubuntu now have several tens of thousands of packages.
So it is better to use the distro tool. In my use case on
.deb-based distros, this is
apt-cache. Here is a quick example for the (Ubuntu 17.04) server on which I type this:
$ sudo apt-get update -qq ## suppress stdout display $ apt-cache search r-cran- | wc -l 419 $
So a very vanilla Ubuntu installation has "merely" 400+ binary CRAN packages. Nothing to write home about (yet) -- but read on.
A decade ago, I was involved in two projects to turn all of CRAN into .deb binaries. We had a first ad-hoc predecessor project, and then (much better) a 'version 2' thanks to the excellent Google Summer of Code work by Charles Blundell (and mentored by me). I ran with that for a while and carried at the peak about 2500 binaries or so. And then my controlling db died, just as I visited CRAN to show it off. Very sad. Don Armstrong ran with the code and rebuilt it on better foundations and had for quite some time all of CRAN and BioC built (peaking at maybe 7k package). Then his RAID died. The surviving effort is the one by Michael Rutter who always leaned on the Lauchpad PPA system to build his packages. And those still exist and provide a core of over 10k packages (but across different Ubuntu flavours, see below).
# Add marutter's c2d4u repository, (and rrutter for CRAN builds too) sudo add-apt-repository -y "ppa:marutter/rrutter" sudo add-apt-repository -y "ppa:marutter/c2d4u"
After that one can query
apt-cache as above, but take advantage of a much larger pool with over 3500 packages (see below). The
add-apt-repository command does the Right Thing (TM) in terms of both getting the archive key, and adding the
apt source entry to the config directory.
Now, all this command-line business is nice. But can we do all this programmatically from R? Sort of.
The RcppAPT package interface the libapt library, and provides access to a few functions. I used this feature when I argued (unsuccessfully, as it turned out) for a particular issue concerning Debian and R upgrades. But that is water under the bridge now, and the main point is that "yes we can".
Building on RcppAPT, within the Rocker Project we built on top of this by proving a particular class of containers for different Ubuntu releases which all contain i) RcppAPT and ii) the required
apt source entry for Michael's repos.
So now we can do this
$ docker run --rm -ti rocker/r-apt:xenial /bin/bash -c 'apt-get update -qq; apt-cache search r-cran- | wc -l' 3525 $
This fires up the corresponding Docker container for the
xenial (ie 16.04 LTS) release, updates the
apt indices and then searches for
r-cran-* packages. And it seems we have a little over 3500 packages. Not bad at all (especially once you realize that this skews strongly towards the more popular packages).
A little while a ago a seemingly very frustrated user came to Carl and myself and claimed that out Rocker Project sucketh because building
rstan was all but impossible. I don't have the time, space or inclination to go into details, but he was just plain wrong. You do need to know a little about C++, package building, and more to do this from scratch. Plus, there was a long-standing issue with rstan and newer Boost (which also included several workarounds).
Be that as it may, it serves as nice example here. So the first question: is
$ docker run --rm -ti rocker/r-apt:xenial /bin/bash -c 'apt-get update -qq; apt-cache show r-cran-rstan' Package: r-cran-rstan Source: rstan Priority: optional Section: gnu-r Installed-Size: 5110 Maintainer: cran2deb4ubuntu <email@example.com> Architecture: amd64 Version: 2.16.2-1cran1ppa0 Depends: pandoc, r-base-core, r-cran-ggplot2, r-cran-stanheaders, r-cran-inline, r-cran-gridextra, r-cran-rcpp,\ r-cran-rcppeigen, r-cran-bh, libc6 (>= 2.14), libgcc1 (>= 1:4.0), libstdc++6 (>= 5.2) Filename: pool/main/r/rstan/r-cran-rstan_2.16.2-1cran1ppa0_amd64.deb Size: 1481562 MD5sum: 60fe7cfc3e8813a822e477df24b37ccf SHA1: 75bbab1a4193a5731ed105842725768587b4ec22 SHA256: 08816ea0e62b93511a43850c315880628419f2b817a83f92d8a28f5beb871fe2 Description: GNU R package "R Interface to Stan" Description-md5: c9fc74a96bfde57f97f9d7c16a218fe5 $
It would seem so. With that, the following very minimal Dockerfile is all we need:
## Emacs, make this -*- mode: sh; -*- ## Start from xenial FROM rocker/r-apt:xenial ## This handle reaches Carl and Dirk MAINTAINER "Carl Boettiger and Dirk Eddelbuettel" firstname.lastname@example.org ## Update and install rstan RUN apt-get update && apt-get install -y --no-install-recommends r-cran-rstan ## Make R the default CMD ["R"]
In essence, it executes one command: install
rstan but from binary taking care of all dependencies. And lo and behold, it works as advertised:
$ docker run --rm -ti rocker/rstan:local Rscript -e 'library(rstan)' Loading required package: ggplot2 Loading required package: StanHeaders rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3) For execution on a local, multicore CPU with excess RAM we recommend calling rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectCores()) $
So there: installing from binary works, takes care of dependencies, is easy and as an added bonus even faster. What's not too like?
(And yes, a few of us are working on a system to have more packages available as binaries, but it may take another moment...)