Thinking inside the box

Thu, 17 Aug 2023

#43: r2u Faster Than the Alternatives

Welcome to the 43th post in the $R^4 series.

And with that, a good laugh. When I set up Sunday’s post, I was excited enough about the (indeed exciting !!) topic of r2u via browser or vscode that I mistakenly labeled it as the 41th post. And overlooked the existing 41th post from July! So it really is as if Douglas Adams, Arthur Dent, and, for good measure, Dirk Gently, looked over my shoulder and declared there shall not be a 42th post!! So now we have two 41th post: Sunday’s and July’s.

Back the current topic, which is of course r2u. Earlier this week we had a failure in (an R based) CI run (using a default action which I had not set up). A package was newer in source than binary, so a build from source was attempted. And of course failed as it was a package needing a system dependency to build. Which the default action did not install.

I am familiar with the problem via my general use of r2u (or my r-ci which uses it under the hood). And there we use a bspm variable to prefer binary over possibly newer source. So I was curious how one would address this with the default actions. It so happens that the same morning I spotted a StackOverflow question on the same topic, where the original poster had suffered the exact same issue!

I offered my approach (via r2u) as a comment and was later notified of a follow-up answer by the OP. Turns our there is a new, more powerful action that does all this, potentially flipping to a newer version and building it, all while using a cache.

Now I was curious, and in the evening cloned the repo to study the new approach and compare the new action to what r2u offers. In particular, I was curious if a use of caches would be benficial on repeated runs. A screenshot of the resulting Actions and their times follows.

Turns out maybe not so much (yet ?). As the actions page of my cloned ‘comparison repo’ shows in this screenshot, r2u is consistently faster at always below one minute compared to new entrant at always over two minutes. (I should clarify that the original actions sets up dependencies, then scrapes, and commits. I am timing only the setup of dependencies here.)

We can also extract the six datapoints and quickly visualize them.

Now, this is of course entirely possibly that not all possible venues for speedups were exploited in how the action setup was setup. If so, please file an issue at the repo and I will try to update accordingly. But for now it seems that a default of setup r2u is easily more than twice as fast as an otherwise very compelling alternative (with arguably much broader scope). However, where r2u choses to play, on the increasingly common, popular and powerful Ubuntu LTS setup, it clearly continues to run circles around alternate approaches. So the saying remains:

r2u: fast, easy, reliable.

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Originally posted 2023-08-13, minimally edited 2023-08-15 which changed the timestamo and URL.

/code/r4 | permanent link

Tue, 15 Aug 2023

#41: Using r2u in Codespaces

Welcome to the 41th post in the $R^4 series. This post draws on joint experiments first started by Grant building on the lovely work done by Eitsupi as part of our Rocker Project. In short, r2u is an ideal match for Codespaces, a Microsoft/GitHub service to run code ‘locally but in the cloud’ via browser or Visual Studio Code. This posts co-serves as the README.md in the .devcontainer directory as well as a vignette for r2u.

So let us get into it. Starting from the r2u repository, the .devcontainer directory provides a small self-containted file devcontainer.json to launch an executable environment R using r2u. It is based on the example in Grant McDermott’s codespaces-r2u repo and reuses its documentation. It is driven by the Rocker Project’s Devcontainer Features repo creating a fully functioning R environment for cloud use in a few minutes. And thanks to r2u you can add easily to this environment by installing new R packages in a fast and failsafe way.

Try it out

To get started, simply click on the green “Code” button at the top right. Then select the “Codespaces” tab and click the “+” symbol to start a new Codespace.

The first time you do this, it will open up a new browser tab where your Codespace is being instantiated. This first-time instantiation will take a few minutes (feel free to click “View logs” to see how things are progressing) so please be patient. Once built, your Codespace will deploy almost immediately when you use it again in the future.

After the VS Code editor opens up in your browser, feel free to open up the examples/sfExample.R file. It demonstrates how r2u enables us install packages and their system-dependencies with ease, here installing packages sf (including all its geospatial dependencies) and ggplot2 (including all its dependencies). You can run the code easily in the browser environment: Highlight or hover over line(s) and execute them by hitting Cmd+Return (Mac) / Ctrl+Return (Linux / Windows).

(Both example screenshots reflect the initial codespaces-r2u repo as well as personal scratchspace one which we started with, both of course work here too.)

Do not forget to close your Codespace once you have finished using it. Click the “Codespaces” tab at the very bottom left of your code editor / browser and select “Close Current Codespace” in the resulting pop-up box. You can restart it at any time, for example by going to https://github.com/codespaces and clicking on your instance.

Extend r2u with r-universe

r2u offers “fast, easy, reliable” access to all of CRAN via binaries for Ubuntu focal and jammy. When using the latter (as is the default), it can be combined with r-universe and its Ubuntu jammy binaries. We demontrates this in a second example file examples/censusExample.R which install both the cellxgene-census and tiledbsoma R packages as binaries from r-universe (along with about 100 dependencies), downloads single-cell data from Census and uses Seurat to create PCA and UMAP decomposition plots. Note that in order run this you have to change the Codespaces default instance from ‘small’ (4gb ram) to ‘large’ (16gb ram).

Local DevContainer build

Codespaces are DevContainers running in the cloud (where DevContainers are themselves just Docker images running with some VS Code sugar on top). This gives you the very powerful ability to ‘edit locally’ but ‘run remotely’ in the hosted codespace. To test this setup locally, simply clone the repo and open it up in VS Code. You will need to have Docker installed and running on your system (see here). You will also need the Remote Development extension (you will probably be prompted to install it automatically if you do not have it yet). Select “Reopen in Container” when prompted. Otherwise, click the >< tab at the very bottom left of your VS Code editor and select this option. To shut down the container, simply click the same button and choose “Reopen Folder Locally”. You can always search for these commands via the command palette too (Cmd+Shift+p / Ctrl+Shift+p).

Use in Your Repo

To add this ability of launching Codespaces in the browser (or editor) to a repo of yours, create a directory .devcontainers in your selected repo, and add the file .devcontainers/devcontainer.json. You can customize it by enabling other feature, or use the postCreateCommand field to install packages (while taking full advantage of r2u).

Acknowledgments

There are a few key “plumbing” pieces that make everything work here. Thanks to:

My Rocker Project colleague @eitsupi for maintaining the R DevContainer Features.
@renkun-ken and the rest of the VS Code R extension team.
@Enchufa2 for bspm making package installation to the sysstem so seamless.
@grantmcdermott for the initial codespaces-r2u setup from which we derived this.
Last but not least everybody who helped me make r2u possible, tested it, or sent hints for improvement.

Colophon

More information about r2u is at its site, and we answered some question in issues, and at stackoverflow. More questions are always welcome!

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Originally posted 2023-08-13, minimally edited 2023-08-15 which changed the timestamo and URL.

/code/r4 | permanent link

Sun, 23 Jul 2023

#41: Another r2u Example – Really Simple CI

Welcome to the 41th post in the $R^4 series. Just as the previous post illustrated r2u use to empower interactive Google Colab sessions, today we want to look at continuous integration via GitHub Actions.

Actions are very powerful, yet also intimidating and complex. How does one know what to run? How does ensure requirements are installed? What does these other actions do?

Here we offer a much simpler yet fully automatic solution. It takes advantage of the fact that r2u integrates fully and automatically with the system, here apt, without us having to worry about the setup. One way to make this very easy is the use of the Rocker containers for r2u. They already include the few lines of simple (and scriptable) setup, and have bspm setup so that R commands to install packages dispatch to apt and will bring all required dependencies automatically and easily.

With that the required yaml file for an action can be as simple as this:

name: r2u

on:
  push:
  pull_request:
  release:

jobs:
  ci:
    runs-on: ubuntu-latest
    container:
      image: rocker/r2u:latest
    steps:
      - uses: actions/checkout@v3
      - name: SessionInfo
        run: R -q -e 'sessionInfo()'
      #- name: System Dependencies
      #  # can be used to install e.g. cmake or other build dependencies
      #  run: apt update -qq && apt install --yes --no-install-recommends cmake git
      - name: Package Dependencies
        run: R -q -e 'remotes::install_deps(".", dependencies=TRUE)'
      - name: Build Package
        run: R CMD build --no-build-vignettes --no-manual .
      - name: Check Package
        run: R CMD check --no-vignettes --no-manual $(ls -1tr *.tar.gz | tail -1)

There are only a few key components here.

First, we have the on block where for simplicity we select pushes, pull requests and releases. One could reduce this to just pushes by removing or commenting out the next two lines. Many further refinements are possible and documented but not reqired.

Second, the jobs section and its sole field ci saythat we are running this CI on Ubuntu in its latest release. Importantly we then also select the rocker container for r2 meaning that we explicitly select running in this container (which happens to be an extension and refinement of ubuntu-latest). The latest tag points to the most recent LTS release, currently jammy aka 22.04. This choice also means that our runs are limited to Ubuntu and exclude macOS and Windows. That is a choice: not every CI task needs to burn extra (and more expensive) cpu cycles on the alternative OS, yet those can always be added via other yaml files possibly conditioned on fewer runs (say: only pull requests) if needed.

Third, we have the basic sequence of steps. We check out the repo this file is part of (very standard). After that we ask R show the session info in case we need to troubleshoot. (These two lines could be commented out.) Next we show a commented-out segment we needed in another repo where we needed to add cmake and git as the package in question required local compilation during build. Such a need is fairly rare, but as shown can be be accomodated easily while taking advantage of the rich development infrastructure provided by Ubuntu. But the step should be optional for most R packages so it is commented out here. The next step uses the remotes package to look at the DESCRIPTION file and install all dependencies which, thanks to r2u and bspm, will use all Ubuntu binaries making it both very fast, very easy, and generally failsafe. Finally we do the two standard steps of building the source package and checking it (while omitting vignettes and the (pdf) manual as the container does not bother with a full texlive installation—this could be altered if desired in a derived container).

And that’s it! The startup cost is a few seconds to pull the container, plus a few more seconds for dependencies – and let us recall that e.g. the entire tidyverse installs all one hundred plus packages in about twenty seconds as shown in earlier post. After that the next cost is generally just what it takes to build and check your package once all requirements are in.

To use such a file for continuous integration, we can install it in the .github/workflows/ directory of a repository. One filename I have used is .github/workflows/r2u.yaml making it clear what this does and how.

More information about r2u is at its site, and we answered some question in issues, and at stackoverflow. More questions are always welcome!

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sat, 08 Jul 2023

#40: Another r2u Example – Making Colab Easier

Welcome to the 40th post in the $R^4 series. This one will just be a very brief illustration of r2u use in what might be an unexpected place: Google Colab. Colab has a strong bent towards Jupyter and Python but has been supporting R compute kernels for some time (by changing what they call the ‘runtime’). And with a little exploration one can identify these are (currently, as of July 2023) running Ubuntu 20.04 aka ‘focal’.

Which is of course one of two system supported by our lovely r2u project (with the other being the newer 22.04 aka ‘jammy’). And I mostly tweeted / tooted about r2u since the its introduction in #37. And gave basically just a mention in passing in ‘faster feedback’ post #38 as well as the ‘faster feedback in ci’ post #39). So a brief recap may be in order. In essence, r2u makes all of CRAN available as full-fledged Ubuntu binaries with complete and full dependencies which are then installed directly and quickly via apt. Which, to top it of, are accessed directly from R via install.packages() so no special knowledge or sauce needed. We often summarize it as “fast, easy, reliable: what is not to like”.

And, as we established in a few minutes of probing, it also works in the ‘focal’-based Colab session. The screen shot shows the basic step of fetching the setup script (for ‘plain’ Ubuntu focal system) from r2u, making it executable and running it. Total time: 34 seconds. And after that we see the pure magic of install.packages("tidyverse") installing all of it in nine seconds. Additionally, we add the brms package in thirty-one seconds cia install.packages("brms"). Both load just fine and echo their current values.

The commands that are executed in that R session are just

download.file("https://github.com/eddelbuettel/r2u/raw/master/inst/scripts/add_cranapt_focal.sh",
              "add_cranapt_focal.sh")
Sys.chmod("add_cranapt_focal.sh", "0755")
system("./add_cranapt_focal.sh")
install.packages("tidyverse")
library(tidyverse)
install.packages("brms")
library(brms)

The timings are the Colab notebook are visible in the left margin. The lack of output makes debugging a little trickier so I still recommend to use r2u for first expploration via a Docker container as e.g. rocker/r2u:jammy.

More information about r2u is at its site, and we answered some question in issues, and at stackoverflow. More questions are always welcome!

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Mon, 30 Jan 2023

#39: Faster Feedback Systems – A Continuous Integration Example

Welcome to the 39th post in the relatively randomly recurring rants, or R⁴ for short. Today’s post picks up where the previous post #38: Faster Feedback Systems started. As we argued in #38, the need for fast feedback loops is fairly universal and widespread. Fairly shortly after we posted #38, useR! 2022 happened and one presentation had the key line

Waiting 1-24 minutes for a build to finish can be a massive time suck.

which we would like to come back to today. Furthermore, the unimitable @b0rk had a fabulous tweet just weeks later stating the same point debugging strategy: shorten your feedback loop as a key in a successful debugging strategy.

So in sum: shorter is better. Nobody likes to wait. And shorter i.e. faster is a key and recurrent theme in the R⁴ series. Today we have a fairly nice illustration of two aspects we have stressed before:

Fewer dependencies makes for faster installation time (apart from other desirable robustness aspects); and
Using binaries makes for faster installation time as it removes the need for compilations.

The combined effects can be staggering as we show below. The example is motivated by a truly “surprising” (we are being generous here) comment we received as an aside when discussing the eternal topic of whether R users do, or do not, have a choice when picking packages, or approaches, or verses. To our surprise, we were told that “packages are not substitutable”. Which is both demonstrably false (see below) and astonishing as it came from an academic. I.e. someone trained and paid to abstract superfluous detail away and recognise and compare ‘core’ features of items under investigation. Truly stunning. But I digress.

CRAN by now has many packages, slowly moving in on 20,000 in total, and is a unique success we commented-on time and time before. By now many packages shadow or duplicate each other, reinvent one another, or revise/review. One example is the pair of packages accessing PostgreSQL databases. There are several, but two are key. The older one is RPostgreSQL which has been around since Sameer Kumar Prayaga wrote it as a Google Summer of Code student in 2008 under my mentorship. The package has now been maintained well (if quietly) for many years by Tomoaki Nishiyama. The other entrant is more recent, though not new, and is RPostgres by Kirill Müller and others. We will for the remainder of this post refer to these two as the tiny and the tidy version as either can been as being a representative of a certain ‘verse’.

The aforementioned comment on non-substitutability struck us as eminently silly, so we set out to prove just how absurd it really is. So about a year ago we set up pair of GitHub repos with minimal code in a pair we called lim-tiny and lim-tidy. Our conjecture was (and is!) that less is more – just as post #34 titled Less Is More argued with respect to package dependencies. Each of the repos just does one thing: a query to a (freely accessible but remote) PostgreSQL database. The tiny version just retrieves a data frame using only the dependencies needed for RPostgreSQL namely DBI and nothing else. The tidy version retrieves a tibble and has access to everything else that comes when installing RPostgres: DBI, dplyr, and magrittr – plus of course their respective dependencies. We were able to let the code run in (very default) GitHub Actions on a weekly schedule without intervention apart from one change to the SQL query when the remote server (providing public bioinformatics data) changed its schema slighly, plus one update to the action yaml code version. No other changes.

We measure the time a standard continuous integration run takes in total using the default GitHub Action setup in the tidy case (which relies on RSPM/PPM binaries, caching, …, and does not rebuild from source each time), and our own r-ci script (introduced in #32 for CI with R. It switched to using r2u during the course of this experiment but already had access to binaries via c2d4u – so it is always binaries-based (see e.g. #37 or #13 for more). The chart shows the evolution of these run-times over the course of the year with one weekly run each.

This study reveals a few things:

“Yes, Veronica, we have a choice”: one year of identical results retrieving a data set via SQL from a PostgreSQL server. We can often choose between alternative packages to get (or process) our data.
There is a Free Lunch (TM) here: RPostgreSQL consistently dominates RPostgres when measured in CI run-time for the same task.
While there is (considerable) variability (likely stemming from heterogenous setups at GitHub Action) the tiny approach is on average about twice as fast as the tidy approch.
You do have a choice. Would you rather wait x minutes for CI, or 2x?

The key point there is that while the net-time to fire off a single PostgreSQL is likely (near-)identical, the net cost of continuous integration is not. In this setup, it is about twice the run-time simply because ‘Less Is More’ which (in this setup) comes out to about being twice as fast. And that is a valid (and concrete, and verifiable) illustration of the overall implicit cost of adding dependencies to creating and using development, test, or run-time environments.

Faster feedback loops make for faster builds and faster debugging. Consider using fewer dependencies, and/or using binaries as provided by r2u.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sun, 19 Jun 2022

#38: Faster Feedback Systems

Engineers build systems. Good engineers always stress and focus efficiency of these systems.

Two recent examples of engineering thinking follow. One was in a video / podcast interview with Martin Thompson (who is a noted high-performance code expert) I came across recently. The overall focus of the hour-long interview is on ‘managing software complexity’. Around minute twenty-two, the conversation turns to feedback loops and systems, and a strong preference for simple and fast systems for more immediate feedback. An important topic indeed.

The second example connects to this and permeates many tweets and other writings by Erik Bernhardsson. He had an earlier 2017 post on ‘Optimizing for iteration speed’, as well as a 17 May 2022 tweet on minimizing feedback loop size, another 28 Mar 2022 tweet reply on shorter feedback loops, then a 14 Feb 2022 post on problems with slow feedback loops, as well as a 13 Jan 2022 post on a priority for tighter feedback loops, and lastly a 23 Jul 2021 post on fast feedback cycles. You get the idea: Erik really digs faster feedback loops. Nobody likes to wait: immediatecy wins each time.

A few years ago, I had touched on this topic with two posts on how to make (R) package compilation (and hence installation) faster. One idea (which I still use whenever I must compile) was in post #11 on caching compilation. Another idea was in post #13: make it faster by not doing it, in this case via binary installation which skip the need for compilation (and which is what I aim for with, say, CI dependencies). Several subsequent posts can be found by scrolling down the r^4 blog section: we stressed the use of the amazing Rutter PPA ‘c2d4u’ for CRAN binaries (often via Rocker containers, the (post #28) promise of RSPM, and the (post #29) awesomeness of bspm. And then in the more recent post #34 from last December we got back to a topic which ties all these things together: Dependencies. We quoted Mies van der Rohe: Less is more. Especially when it comes to dependencies as these elongate the feedback loop and thereby delay feedback.

Our most recent post #37 on r2u connects these dots. Access to a complete set of CRAN binaries with full-dependency resolution accelerates use and installation. This of course also covers testing and continuous integration. Why wait minutes to recompile the same packages over and over when you can install the full Tidyverse in 18 seconds or the brms package and all it needs in 13 seconds as shown in the two gifs also on the r2u documentation site.

You can even power up the example setup of the second gif via this gitpod link giving you a full Ubuntu 22.04 session in your browser to try this: so go forth and install something from CRAN with ease! The benefit of a system such our r2u CRAN binaries is clear: faster feedback loops. This holds whether you work with few or many dependencies, tiny or tidy. Faster matters, and feedback can be had sooner.

And with the title of this post we now get a rallying cry to advocate for faster feedback systems: “FFS”.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sat, 21 May 2022

#37: Introducing r2u with 2 x 19k CRAN binaries for Ubuntu 22.04 and 20.04

One month ago I started work on a new side project which is now up and running, and deserving on an introductory blog post: r2u. It was announced in two earlier tweets (first, second) which contained the two (wicked) demos below also found at the documentation site.

So what is this about? It brings full and complete CRAN installability to Ubuntu LTS, both the ‘focal’ release 20.04 and the recent ‘jammy’ release 22.04. It is unique in resolving all R and CRAN packages with the system package manager. So whenever you install something it is guaranteed to run as its dependencies are resolved and co-installed as needed. Equally important, no shared library will be updated or removed by the system as the possible dependency of the R package is known and declared. No other package management system for R does that as only apt on Debian or Ubuntu can — and this project integrates all CRAN packages (plus 200+ BioConductor packages). It will work with any Ubuntu installation on laptop, desktop, server, cloud, container, or in WSL2 (but is limited to Intel/AMD chips, sorry Raspberry Pi or M1 laptop). It covers all of CRAN (or nearly 19k packages), all the BioConductor packages depended-upon (currently over 200), and only excludes less than a handful of CRAN packages that cannot be built.

Usage

Setup instructions approaches described concisely in the repo README.md and documentation site. It consists of just five (or fewer) simple steps, and scripts are provided too for ‘focal’ (20.04) and ‘jammy’ (22.04).

Demos

Check out these two demos (also at the r2u site):

Installing the full tidyverse in one command and 18 seconds

Installing brms and its depends in one command and 13 seconds (and show gitpod.io)

Integration via `bspm`

The r2u setup can be used directly with apt (or dpkg or any other frontend to the package management system). Once installed apt update; apt upgrade will take care of new packages. For this to work, all CRAN packages (and all BioConductor packages depended upon) are mapped to names like r-cran-rcpp and r-bioc-s4vectors: an r prefix, the repo, and the package name, all lower-cased. That works—but thanks to the wonderful bspm package by Iñaki Úcar we can do much better. It connects R’s own install.packages() and update.packages() to apt. So we can just say (as the demos above show) install.packages("tidyverse") or install.packages("brms") and binaries are installed via apt which is fantastic and it connects R to the system package manager. The setup is really only two lines and described at the r2u site as part of the setup.

History and Motivation

Turning CRAN packages into .deb binaries is not a new idea. Albrecht Gebhardt was the first to realize this about twenty years ago (!!) and implemented it with a single Perl script. Next, Albrecht, Stefan Moeller, David Vernazobres and I built on top of this which is described in this useR! 2007 paper. A most excellent generalization and rewrite was provided by Charles Blundell in an superb Google Summer of Code contribution in 2008 which I mentored. Charles and I described it in this talk at useR! 2009. I ran that setup for a while afterwards, but it died via an internal database corruption in 2010 right when I tried to demo it at CRAN headquarters in Vienna. This peaked at, if memory serves, about 5k packages: all of CRAN at the time. Don Armstrong took it one step further in a full reimplemenation which, if I recall correctly, coverd all of CRAN and BioConductor for what may have been 8k or 9k packages. Don had a stronger system (with full RAID-5) but it also died in a crash and was never rebuilt even though he and I could have relied on Debian resources (as all these approaches focused on Debian). During that time, Michael Rutter created a variant that cleverly used an Ubuntu-only setup utilizing Launchpad. This repo is still going strong, used and relied-upon by many, and about 5k packages (per distribution) strong. At one point, a group consisting of Don, Michael, Gábor Csárdi and myself (as lead/PI) had financial support from the RConsortium ISC for a more general re-implementation , but that support was withdrawn when we did not have time to deliver.

We should also note other long-standing approaches. Detlef Steuer has been using the openSUSE Build Service to provide nearly all of CRAN for openSUSE for many years. Iñaki Úcar built a similar system for Fedora described in this blog post. Iñaki and I also have a arXiv paper describing all this.

Details

Please see the the r2u site for all details on using r2u.

Acknowledgements

The help of everybody who has worked on this is greatly appreciated. So a huge Thank you! to Albrecht, David, Stefan, Charles, Don, Michael, Detlef, Gábor, Iñaki—and whoever I may have omitted. Similarly, thanks to everybody working on R, CRAN, Debian, or Ubuntu—it all makes for a superb system. And another big Thank you! goes to my GitHub sponsors whose continued support is greatly appreciated.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Wed, 23 Feb 2022

#36: pub/sub for live market monitoring with R and Redis

Welcome to the 36th post of the really randomly reverberating R, or R⁴ for short, write-ups. Today’s post is about using Redis, and especially RcppRedis, for live or (near) real-time monitoring with R.

market monitor

There is an saying that “you can take the boy out of the valley, but you cannot the valley out of the boy” so for those of us who spent a decade or two in finance and on trading floors, having “some” market price information available becomes second nature. And/or sometimes it is just good fun to program this.

A good while back Josh posted a gist on a simple-yet-robust while loop. It (very cleverly) uses his quantmod package to access the SP500 in “real-time”. (I use quotes here because at the end of retail broadband one is not at the same market action as someone co-located in a New Jersey data center. It is however not delayed: as an index, it is not immediately tradeable as a stock, etf, or derivative may be all of which are only disseminated as delayed price information, usually by ten minutes.) I quite enjoyed the gist and used it and started tinkering with it. For example, it collects data but only saves (i.e. “persists”) it after market close. If for whatever reason one needs to restart recent history is gone. In any event, I used his code and generalized it a little and published this about a year ago as function intradayMarketMonitor() in my dang package. (See this blog post announcing it.) The chart of the left shows this in action, the chart is a snapshot from a couple of days ago when the vignettes (more on them below) were written.

As lovely as intradayMarketMonitor() is, it also limits itself to market hours. And sometimes you want to see, say, how the market opens on Sunday (futures usually restart at 17h Chicago time), or how news dissipates during the night, or where markets are pre-open, or …. So I both wanted to complement this with futures, and also ‘cache’ it locally so that, say, one machine might collect data and one (or several others) can visualize. For such tasks, Redis is unparalleled.

(Yet I also always felt Redis could do with another, simple, short and sweet introduction stressing the key features of i) being multi-lingual: write in one language, consume in another and ii) loose coupling: no linking as one talks to Redis via standard tcp/ip networking. So I wrote a new intro vignette that is now in RcppRedis. I hope this comes in handy. Comments welcome!)

Our RcppRedis package had long been used for such tasks, and it was easy to set it up. “Standard use” is to loop, fetch some data, push it to Redis, sleep, and start over. Clients do the same: fetch most recent data, plot or report it, sleep, start over. That works, but it has a dual delay as the client sleeping may miss the data update!

The standard answer to this is called publish/pubscribe, or pub/sub. Libraries such as 0mq or zeromq specialise in this. But it turns out Redis already has it. I had some initial difficulty adding it to RcppRedis so for a trial I tested the marvellous rredis package by Bryan and simply instantiated two Redis clients. Now the data getter simply ‘publishes’ a new data point in a given channel, by convention named after the security it tracks. Clients register with the Redis server which does all the actual work of keeping track of who listens to what. The clients now simply ‘listen’ (which is a blocking operation) and as soon as data comes in receive it.

market monitor

This is quite mesmerizing when you just run two command-line clients (in a byobu session, say). As sone as the data is written (as shown on console log) it is consumed. No measruable overhead. Just lovely.

Bryan and I then talked a litte as he may or may not retire rredis. Having implemented the pub/sub logic for both sides once, he took a good hard look at RcppRedis and “just like that” added it there. With some really clever wrinkles for (optional) per-symbol callback as closure attached to the instance. Truly amazeballs And once we had it in there, generalizing from publishing or subscribing to just one symbol easily generalizes to having one listener collect and publish for multiple symbols, and having one or more clients subscribe and listen one, more or even all symbol. All with ease thanks tp Redis. The second chart, also from a few days ago, shows four symbols for four (front-contract) futures for Bitcoin, Crude Oil, SP500, and Gold.

As all this can get a little technical, I wrote a second vignette for RcppRedis on just this: market monitoring. Give this a read, if interested, feedback on this one is most welcome too! But all the code you need is included in the package—just run a local Redis instance.

Before closing, one sour note. I uploaded all this in a new and much improved updated RcppRedis 0.2.0 to CRAN on March 13 – ten days ago. Not only is it still not “there”, but CRAN in their most delightful way also refuses to answer any emails of mine. Just lovely. The package exhibited just one compiler warning: a C++ compiler objected to the (embedded) C library hiredis (included as a fallback) for using a C language construct. Yes. A C++ compiler complaining about C. It’s a non-issue. Yet it’s been ten days and we still have nothing. So irritating and demotivating. Anyway, you can get the package off its GitHub repo.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Mon, 31 Jan 2022

#35: apt install rstudio quarto

Welcome to the 35th post in the ravishingly rabiant R recommendations, or R⁴. Today’s post is about apt and R tools.

Many of us have been running RStudio off our local machines for as long as binaries have been provided. Which is by now probably a bit over a decade. Time flies.

And as nice it is to have matching binaries, in my case in the .deb format used on Debian or Ubuntu, it is wee bit a painful to manually download a file and then install it. Twice the pain if you are lucky enough to be on a system where you can also run RStudio Server. And now three times as painful as you may need a matching quarto-cli binary for the nice quarto service.

So wouldn’t it be nice to have an apt-getable repo? And to autoMAGICall get updated versions when they are available? Oh yes. And I had been bugging JJ from day one. And JJ would almost listen intendly, nod briefly and firmly, and issue an assured we will look into it. Well, they are still looking…

Luckily, years ago, Carl wrote a helper script for our use in Rocker. I promptly adopted these and kept them in the littler examples directory as a pair of script getRStudioDesktop.r and getRStudioServer.r, later complemented by getQuartoCli.r. And I used these for years, somewhere between weekly and monthly.

But it is still very manual: three script calls, one sudo dpkg -i call. And as our good friends at RStudio don’t seem to be coming forward with a repo, I created one at GitHub thinking I could serve the files via GitHub Pages. Which … of course I cannot as the .deb file for rstudio is well above the 100mb limit. So that seemed to be a bit of a setback. But after a bit of pondering, and recognizing that I am now in the fortunate position to have symmetric broadband access at home, I reckoned that until the bandwidth use gets excessive I will serve this as ‘truly personal package archive’ (or tPPA) from here. Note that this is calibrated for my use so Ubuntu amd64 it is. Nothing else. And that it installs ‘dailies’. Which may cause issues for some people. You have warned. Reading tis paragraphs signifies agreement with the terms and limitations. Just kidding.

A quick screenshot from an update earlier is here. Note that I use the fabulous wajig wrapper by Graham Williams here as my frontend to apt, dpkg and more as I have for even longer than I have use RStudio. Its use is tangential here; sudo apt upgrade would have done the same (and is essentially being called). And it demonstrates the main benefit: we are now automated as the cron scheduler launches an update of the PPA at which ever frequency you chose (currently twice a week for me) and after that it becomes part of the normal apt updates we do anyway (and which I do about daily). So that’s main gist: automated apt upgrades of rstudio, rstudio-server, and quarto-cli.

And you can find the underlying code in the GitHub repo ppa-rstudio which I put together a good week ago. I am currently updating the ‘tPPA’ twice a week from crontab and have had two full upgrades already.

And who knows, maybe with a bid of prodding RStudio may come around. One can always hope.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Wed, 08 Dec 2021

#34: Less Is More

Less Is More.

– Ludwig Mies van der Rohe

Welcome to the 34th post in the rambunctiously refreshing R recitations, or R⁴. Today’s post is about architecture.

Mies defined modernism. When still in Europe, I had been to the Neue Nationalgalerie in Berlin which provides a gorgeous space for the arts. Twenty-five years ago, I worked next to his Toronto-Dominion Center in Toronto. Here in Chicago we have numerous buildings: the Federal Center (the Dirksen, the Kluczynski and the US Post Office rounding out the square in the Loop), multiple buildings on the Illinois Tech (aka IIT) Campus where he taught in the architecture department he created and lead, the (formerly called) IBM Plaza building at the river and more.

Structure and minimalism, often based on the same core elements of black steel beams and glass, are a landmark of these buildings. One immediately senses that there is nothing left to take away.

Code and programming can be similar. We too compose based on parts we assemble and combine to create something hopefully larger than the parts. The difficulty arising from too many dependencies is something we discussed before–both here in this 2018 post but also via the tinyverse site.

Over the last seven days, and via uploads to new versions to CRAN, I have switched the vignettes of seven packages from using minidown (which in turn requires rmarkdown and knitr, plus their aggregate dependencies) to using simplermarkdown with its sole dependency. That is, of course, a personal choice. I tend to not even “knit” much in my vignettes (and simplermarkdown supports what I do) but to rely mostly on pandoc for code rendering. So I only need a small subset of the functionality provided, but I can not access ‘just that’ as the default comes with numerous bells, whistles as well as enough other instruments to form a small marching band.

A picture may express this better:

(courtesy of the deepdep package for the figures).

Which of these two setups is less likely to surprise you with random breaks, say in continuous integration? Which takes less time to install, and burns fewer cpu cycles just to be set up, each time we run a new test? Which is taxing your students, colleagues, collaborators, users, … less on setup for use or replication? The first, comprises a total of 29 dependencies, or the second with just one?

My money is on the second choice. Less is more.

/code/r4 | permanent link

Wed, 09 Jun 2021

#33: Collaborative Editing and Execution in Shared Byoby Sessions

Welcome to the 33th post in the rigorously raconteuring R recommendations series, or R⁴ for short. This post is also a post in the T⁴ series of tips, tricks, tools, and toys as it picks up and extends earlier posts on byobu. And it fits nicely in the more recent ESS-Intro series as we show some Emacs. You can find earlier R⁴ posts here, and the T⁴ posts here; the ESS-Intro series is here.

The focus of this short video (and slides) is on collaboration using files, but also entire sessions, execution and all aspects of joint exploration, development or debugging. Anything you can do in a terminal you can also do shared in a terminal. The video contains a brief lightning talk, and a shared session jointly with Grant McDermott and Vicent Arel-Bundock. My big big thanks to both of them for prodding and encouragement, as well as fearless participation in the joint section of the video:

The corresponding pdf slides are here.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Thu, 07 Jan 2021

#32: Portable Continuous Integration using r-ci

Welcome to the 32th post in the rarely raucous R recommendations series, or R⁴ for short. This post covers continuous integration, a topic near and dear to many of us who have to recognise its added value.

The popular and widely-used service at Travis is undergoing changes driven by a hard-to-argue with need for monetization. A fate that, if we’re honest, lies ahead for most “free” services so who know, maybe one day we have to turn away from other currently ubiquitous service. Because one never knows, it can pay off to not get to tied to any one service. Which brings us to today’s post and my r-ci service which allows me to run CI at Travis, at GitHub, at Azure, and on local Docker containers as the video demonstrates. It will likely also work at GitLab and other services, I simply haven’t tried any others.

The slides are here. The r-ci website introduces r-ci at a high-level. This repo at GitHub contains run.sh, and can be used to raise issues, ask questions, or provide feedback.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Thu, 03 Dec 2020

#31: Test your R package against bleeding-edge gcc

Welcome to the 31th post in the rapturously rampant R recommendations series, or R⁴ for short. This post will once again feature Docker for use with R.

Earlier this week, I received a note from CRAN about how my RcppTOML package was no longer building with the (as of right now of course unreleased) version 11 of the GNU C++ compiler, i.e. g++-11. And very kindly even included a hint about the likely fix (which was of course correct). CRAN, and one of its maintainers in particular, is extremely forward-looking in terms of toolchain changes. A year ago we were asked to updated possible use of global variables in C code as gcc-10 tightened the rules. This changes is a C++ one, and a fairly simple one of simply being more explicit with include headers. Previous g++ release had done the same.

The question now was about the least painful way to get g++-11 onto my machine, with the least amount of side-effects. Regular readers of this blog will know where this is headed, but even use of Docker requires binaries. A look at g++-11 within packages.debian.org comes up empty. No Debian means no Ubuntu. But … there is a PPA for Ubuntu with toolchain builds we have used before. And voilà there we have it: within the PPA for Ubuntu Toolchain repository is the volatile packages PPA with both g++-10 and g++-11. Here Ubuntu 20.10 works with g++-10, but g++-11 requires Ubuntu 21.04. Docker containers are there for either. So with the preliminaries sorted out, the key steps are fairly straightforward:

start from ubuntu:21.04 to be able to install g++-11 later
install the software-properties-common package to be able to add a PPA
(plus a few more packages to deal with the repository signing key)
run the sudo add-apt-repository ppa:ubuntu-toolchain-r/volatile command to add the volatile packages PPA
install g++-11 (along with, for good measure) gcc-11 and gfortran-11
use update-alternative (a clever Debian/Ubuntu command) to make version ‘11’ the default
install R itself (via r-base-core) which we simply take from the distro as 21.04 is by construction very recent
install Rcpp via the r-cran-rcpp binary which covers all dependencies for the package in question

And that is it! RcppTOML is fairly minimal and could be a member of the tinyverse so no other dependencies are needed—if your package has any you could just use the standard steps to install from source, or binary (including using RSPM or bspm). You can see the resulting Dockerfile which contains a minimal amount of extra stuff to deal with some environment variables and related settings. Nothing critical, but it smoothes the experience somewhat.

This container is now built (under label rocker/r-edge with tags latest and gcc-11), and you can download it from Docker Hub. With that the ‘proof’ of the (now fixed and uploaded) package building becomes as easy as

edd@rob:~/git/rcpptoml(master)$ docker run --rm -ti -v $PWD:/mnt -w /mnt rocker/r-edge:gcc-11 g++ --version
g++ (Ubuntu 11-20201128-0ubuntu2) 11.0.0 20201128 (experimental) [master revision fb6b29c85c4:a331ca6194a:e87559d202d90e614315203f38f9aa2f5881d36e]
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

edd@rob:~/git/rcpptoml(master)$ 
edd@rob:~/git/rcpptoml(master)$ docker run --rm -ti -v $PWD:/mnt -w /mnt rocker/r-edge:gcc-11 R CMD INSTALL RcppTOML_0.1.7.tar.gz
* installing to library ‘/usr/local/lib/R/site-library’
* installing *source* package ‘RcppTOML’ ...
** using staged installation
** libs
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/ -DCPPTOML_USE_MAP -I'/usr/lib/R/site-library/Rcpp/include'    -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-Fuvi9C/r-base-4.0.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c RcppExports.cpp -o RcppExports.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/ -DCPPTOML_USE_MAP -I'/usr/lib/R/site-library/Rcpp/include'    -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-Fuvi9C/r-base-4.0.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c parse.cpp -o parse.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o RcppTOML.so RcppExports.o parse.o -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/00LOCK-RcppTOML/00new/RcppTOML/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (RcppTOML)
edd@rob:~/git/rcpptoml(master)$

I hope both the availability of such a base container with gcc-11 (and g++-11 and gfortran-11) as well as a “recipe” for building similar containers with newer clang version will help other developers.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sat, 26 Sep 2020

#30: Easy, Reliable, Fast and Portable Linux and macOS Continuous Integration

Welcome to the 30th post in the rarified R recommendation resources series or R⁴ for short. The last post introduced BSPM. In the four weeks since, we have worked some more on BSPM to bring it to the point where it is ready for use with continuous integration. Building on this, it is now used inside the run.sh script that driven our CI use for many years (via the r-travis repo).

Which we actually use right now on three different platforms:

Travis CI as for example seen in this .travis.yml file (as well as a half dozen more);
Azure Pipelines for example seen in this azure-pipelines.yml;
GitHub Actions as for example seen in this R-CMD-check.yaml file.

All three use the exact same script facilitating this, and run a ‘matrix’ over Linux and macOS. You read this right: one CI setup that is portable and which you can take to your CI provider of choice. No lock-in or tie-in. Use what works, change at will. Or run on all three if you like burning extra cycles.

This is already used by handful of my repos as well as by at least two repos of friends also deploying r-travis. How does it work? In a nutshell we are

downloading run.sh via curl and changing its mode;
running run.sh bootstrap which sets the operating system default:
- on Linux we use Ubuntu,
  - add two PPAs repos for R itself and over 4600 r-cran-* binaries,
  - and enable BSPM to use these from install.packages()
- on macOS we use the standard setup also used on Travis, GitHub Actions and elsewhere;
- this provides us with fast, reliable, easy, and portable access to binaries on two OSs under dependency resolution;
running run.sh install_deps to install just the requireded Depends:, Imports: and LinkingTo:
running run.sh tests to build the tarball and test it via R CMD check --as-cran.

There are several customizations that are possible via environment variables

additional PPAs or drat repos can be added to offer even more package choice;
alternatively one could run run.sh install_all to also install Suggests:;
optionally one could run run.sh install_r pkgA pkgB ... to install packages explicitly listed;
optionally one could also run run.sh install_aptget r-cran-pkga r-cran-pkgb otherpackage to add more Ubuntu binaries.

We find this setup compelling. The scheme is simple: there really is just one shell script behind it which can also be downloaded and altered. The scheme is also portable as we can (as shown) rotate between CI provides. The scheme is also more flexible: in case of debugging needs one can simply run the script on a local Docker or VM instance. Lastly, the scheme moves away from single points of failure or breakage.

Currently the script uses only BSPM as I had the hunch that it would a) work and b) be compelling. Adding support for RSPM would be equally easy, but I have no immediate need to do so. Adding BioConductor installation may be next. That is easy when BioConductor uses r-release; it may be little more challenging under r-devel to but it should work too. Stay tuned.

In the meantime, if the above sounds compelling, give run.sh from r-travis a go!

If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Wed, 26 Aug 2020

#29: Easy, Reliable, Fast Linux CRAN Binaries via BSPM

Welcome to the 29th post in the randomly repeating R recommendations series or R⁴ for short. Our last post #28 introduced RSPM, and just before that we also talked in #27 about binary installations on Ubuntu (which was also a T⁴ video). This post is joined with Iñaki Ucar and mainly about work we have done with his bspm package.

Background

CRAN has been a cornerstone of the success of R in recent years. As a well-maintained repository with stringent quality control, it ensures users have access to highest-quality statistical / analytical software that “just works”. Users on Windows and macOS also benefit from faster installation via pre-compiled binary packages.

Linux users generally install from source, which can be more tedious and, often, much slower. Those who know where to look have had access to (at least some) binaries for years as well (and one of us blogged and vlogged about this at length). Debian users get close to 1000 CRAN and BioConductor packages (and, true to Debian form, for well over a dozen hardware platforms). Michael Rutter maintains a PPA with 4600 binaries for three different Ubuntu flavors (see c2d4u4.0+). More recently, Fedora joined the party with 16000 (!!) binaries, essentially all of CRAN, via a Copr repository (see iucar/cran).

The buzz currently is however with RSPM, a new package manager by RStudio. An audacious project, it provides binaries for several Linux distributions and releases. It has already been tested in many RStudio Cloud sessions (including with some of our students) as well as some CI integrations.

RSPM cuts “across” and takes the breadth of CRAN across several Linux distributions, bringing installation of pre-built CRAN packages a binaries under their normal CRAN package names. Another nice touch is the integration with install.packages(): these binaries are installed in a way that is natural for R users—but as binaries. It is however entirely disconnected from the system package management. This means that the installation of a package requiring an external library may “succeed” and still fail, as a required library simply cannot be pulled in directly by RSPM.

So what is needed is a combination. We want binaries that are aware of their system dependencies but accessible directly from R just like RSPM offers it. Enter BSPM—the Bridge to System Package Manager package (also on CRAN).

The first illustration (using Ubuntu 18.04) shows RSPM on the left, and BSPM on the right, both installing the graphics package Cairo (and both using custom Rocker containers).

This fails for RSPM as no binary is present and a source build fails for the familiar lack of a -dev package. It proceeds just fine on the right under BSPM.

A second illustration shows once again RSPM on the left, and BSPM on the right (this time on Fedora), both installing the units package without a required system dependency.

The installation of units works for BSPM as the dependency libudunits is brought in, but fails under RSPM. The binary installation succeeds in both cases, but the missing dependency (the UDUNITS2 library) is brought in only by BSPM. Consequently, the package fails to load under RSPM.

Summary

To conclude, highlights of BSPM are:

direct installation of binary packages from R via R commands under their normal CRAN names (just like RSPM);
full integration with the system package manager, delivering system installations (improving upon RSPM);
full dependency resolution for R packages, including system requirements (improving upon RSPM).

This offers easy, reliable, fast installation of R packages, and we invite you to pick all three. We recommend usage with either Ubuntu with the 4.6k packages via the Rutter PPA, or Fedora via the even more complete Copr repository (which already includes a specially-tailored version of BSPM called CoprManager).

We hope this short note wets your appetite to learn more about bspm (which is itself on CRAN) and the two sets of Rocker containers shown. The rocker/r-rspm container comes in two two flavours for Ubuntu 18.04 and 20.04. Similarly, the rocker/r-bspm container comes in the same two two flavours for Ubuntu 18.04 and 20.04, as well as in a Debian testing variant.

Feedback is appreciated at the bspm or rocker issue trackers.

If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Fri, 03 Jul 2020

#28: Welcome RSPM and test-drive with Bionic and Focal

Welcome to the 28th post in the relatively random R recommendations series, or R⁴ for short. Our last post was a “double entry” in this R⁴ series and the newer T⁴ video series and covered a topic touched upon in this R⁴ series multiple times: easy binary install, especially on Ubuntu.

That post already previewed the newest kid on the block: RStudio’s RSPM, now formally announced. In the post we were only able to show Ubuntu 18.04 aka bionic. With the formal release of RSPM support has been added for Ubuntu 20.04 aka focal—and we are happy to announce that of course we added a corresponding Rocker r-rspm container. So you can now take full advantage of RSPM either via docker pull rocker/r-rspm:18.04 or via docker pull rocker/r-rspm:20.04 covering the two most recent LTS releases.

RSPM is a nice accomplishment. Covering multiple Linux distributions is an excellent achievement. Allowing users to reason in terms of the CRAN packages (i.e. installing xml2, not r-cran-xml2) eases use. Doing it from via the standard R command install.packages() (or wrapper around it like our install.r from littler package) is very good too and an excellent technical achievement.

There is, as best as I can tell, only one shortcoming, along with one small bit of false advertising. The shortcoming is technical. By bringing the package installation into the user application domain, it is separated from the system and lacks integration with system libraries. What do I mean here? If you were to add R to a plain Ubuntu container, say 18.04 or 20.04, then added the few lines to support RSPM and install xml2 it would install. And fail. Why? Because the system library libxml2 does not get installed with the RSPM package—whereas the .deb from the distribution or PPAs does. So to help with some popular packages I added libxml2, libunits and a few more for geospatial work to the rocker/r-rspm containers. Being already present ensures packages xml2 and units can run immediately. Please file issue tickets at the Rocker repo if you come across other missing libraries we could preload. (A related minor nag is incomplete coverage. At least one of my CRAN packages does not (yet?) come as a RSPM binary. Then again, CRAN has 16k packages, and the RSPM coverage is much wider than the PPA one. But completeness would be neat. The final nag is lack of Debian support which seems, well, odd.)

So what about the small bit of false advertising? Well it is claimed that RSPM makes installation “so much faster on Linux”. True, faster than the slowest possible installation from source. Also easier. But we had numerous posts on this blog showing other speed gains: Using ccache. And, of course, using binaries. And as the initial video mentioned above showed, installing from the PPAs is also faster than via RSPM. That is easy to replicate. Just set up the rocker/r-ubuntu:20.04 (or 18.04) container alongside the rocker/r-rspm:20.04 (or also 18.04) container. And then time install.r rstan (or install.r tinyverse) in the RSPM one against apt -y update; apt install -y r-cran-rstan (or ... r-cran-tinyverse). In every case I tried, the installation using binaries from the PPA was still faster by a few seconds. Not that it matters greatly: both are very, very quick compared to source installation (as e.g. shown here in 2017 (!!)) but the standard Ubuntu .deb installation is simply faster than using RSPM. (Likely due to better CDN usage so this may change over time. Neither method appears to do downloads in parallel so there is scope for both for doing better.)

So in sum: Welcome to RSPM, and nice new tool—and feel free to “drive” it using rocker/r-rspm:18.04 or rocker/r-rspm:20.04.

If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Mon, 22 Jun 2020

#27: R and CRAN Binaries for Ubuntu

Welcome to the 27th post in the rationally regularized R revelations series, or R⁴ for short. This is a edited / updated version of yesterday’s T^4 post #7 as it really fits the R⁴ series as well as it fits the T⁴ series.

A new video in both our T^4 series of video lightning talks with tips, tricks, tools, and toys is also a video in the R^4 series as it revisits a topic previously covered in the latter: how to (more easily) get (binary) packages onto your Ubuntu system. In fact, we show it in three different ways.

The slides are here.

This repo at GitHub support the series: use it to open issues for comments, criticism, suggestions, or feedback.

Thanks to Iñaki Ucar who followed up on twitter with a Fedora version. We exchanged some more message, and concluded that complete comparison (from an empty Ubuntu or Fedora container) to a full system takes about the same time on either system.

If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sun, 26 Apr 2020

#26: Upgrading to R 4.0.0

Welcome to the 26th post in the rationally regularized R revelations series, or R⁴ for short.

R 4.0.0 was released two days ago, and a casual glance at some social media conversations appears to suggest quite some confusion, almost certainly some misunderstandings, and possibly also a fair amount of fear, uncertainty, and doubt about the process. So I thought I could show how I upgrade my own main workstation, live and in colour without a safety net. (Almost: I did upgrade my laptop yesterday which went swimmingly, if more slowly.) So here is a fresh video about upgrading to R 4.0.0, with some support slides as usual:

The slides used in the video are at this link.

A few quick follow-ups to the ‘live’ nature of this. The pbdZMQ package did in fact install smoothly once the (Ubuntu) -dev packages for Zero MQ were (re-)installed; then IRkernel also followed. BioConductor completed once I realized that GOSemSim needed the annotation package GO.db to be updated, that allowed MNF to install. So the only bug, really, was the circular depdency between pkgload and testthat. Overall, not bad at all for a quick afternoon session!

And as mentioned, if you are interested and have questions concerning use of R on a .deb based system like Debain or Ubuntu (or Mint or …), the r-sig-debian list is a very good and friendly place to ask them.

If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sun, 12 Apr 2020

#25: Test, test, test, … those R 4.0.0 binaries with Ubuntu 20.04 and Rocker

Welcome to the 25nd post in the randomly recurring R recitations series, or R⁴ for short.

Just yesterday, we posted a short post along with a video and supporting slides. It covered how to test the soon-to-be-released R 4.0.0 on a custom Ubuntu 18.04 Rocker container.

A container for Ubuntu 20.04, which is itself in final beta stages, was being built while the video was made. As it is available now, we created a quick follow-up video showing the use under Ubuntu 20.04:

The updated supporting slides from the video are still at this link.

What we showed in both videos does of course also work directly on Ubuntu (or Debian, using those source repos) installations; the commands shown in the Rocker use case generally apply equally to a normal installation.

If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sat, 11 Apr 2020

#24: Test, test, test, … those R 4.0.0 binaries with Ubuntu and Rocker

Welcome to the 24nd post in the relentlessly regular R ravings series, or R⁴ for short.

R 4.0.0 will be released in less than two weeks, and testing is very important. I had uploaded two alpha release builds (at the end of March and a good week ago) as well as a first beta release yesterday, all to the Debian ‘experimental’ distribution (as you can see here) tracking the release schedule set by Peter Dalgaard. Because R 4.0.0 will require reinstallation of all packages, it makes some sense to use a spare machine. Or a Docker container. So to support that latter mode, I have now complemented the binaries created from the r-base source package with all base and recommended packages, providing a starting point for actually running simple tests. Which is what we do in the video, using again the ‘R on Ubuntu (18.04)’ Rocker container:

Slides from the video are at this link.

This container based on 18.04 is described here on the Docker Hub; a new 20.04 container with the pre-release of the next Ubuntu LTS should be there shortly once it leaves the build queue.

What we showed does of course also work on direct Ubuntu (or Debian, using those source repos) installations; the commands shown in the Rocker use case generally apply equally to a normal installation.

If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Mon, 05 Aug 2019

#23: Debugging with Docker and Rocker – A Concrete Example helping on macOS

Welcome to the 23nd post in the rationally reasonable R rants series, or R⁴ for short. Today’s post was motivated by an exchange on the r-devel list earlier in the day, and a few subsequent off-list emails.

Roger Koenker posted a question: how to best debug an issue arising only with gfortran-9 which is difficult to get hold off on his macOS development platform. Some people followed up, and I mentioned that I had good success using Docker, and particularly our Rocker containers—and outlined a quick mini-tutorial (which had one mini-typo lacking the imporant slash in -w /work). Roger and I followed up over a few more off-list emails, and by and large this worked for him.

So what follows below is a jointly written / edited ‘mini HOWTO’ of how to deploy Docker on macOS for debugging under particular toolchains more easily available on Linux. Windows and Linux use should be very similar, albeit differ in the initial install. In fact, I frequently debug or test in Docker sessions when I do not want to install on my Linux host system. Roger sent one version (I had also edited) back to the list. What follows is my final version.

Debugging with Docker: Getting Hold of Particular Compilers

Context: The quantreg package was seen exhibiting errors when compiled with gfortran-9. The following shows how to use gfortran-9 on macOS by virtue of Docker. It is written in Roger Koenker’s voice, but authored by Roger and myself.

With extensive help from Dirk Eddelbuettel I have installed docker on my mac mini from

https://hub.docker.com/editions/community/docker-ce-desktop-mac

which installs from a dmg in quite standard fashion. This has allowed me to simulate running R in a Debian environment with gfortran-9 and begin the process of debugging my ancient rqbr.f code.

Some further details:

Step 0: Install Docker and Test

Install Docker for macOS following this Docker guide. Do some initial testing, e.g.

docker --version 
docker run hello-world

Step 1: Download `r-base` and test OS

We use the plainest Rocker container rocker/r-base, in the aliased form of the official Docker container for, i.e. r-base. We first ‘pull’, then test the version and drop into bash as second test.

docker pull r-base                       # downloads r-base for us 
docker run --rm -ti r-base R --version   # to check we have the R we want
docker run --rm -ti r-base bash          # now in shell, Ctrl-d to exit

Step 2: Setup the working directory

We tell Docker to run from the current directory and access the files therein. For the work on quantreg package this is projects/rq for RogerL

cd projects/rq
docker run --rm -ti -v ${PWD}:/work -w /work r-base bash

This put the contents of projects/rq into the /work directory, and starts the session in /work (as can be seen from the prompt).

Next, we update the package information inside the container:

root@90521904fa86:/work# apt-get update
Get:1 http://cdn-fastly.deb.debian.org/debian sid InRelease [149 kB]
Get:2 http://cdn-fastly.deb.debian.org/debian testing InRelease [117 kB]
Get:3 http://cdn-fastly.deb.debian.org/debian sid/main amd64 Packages [8,385 kB]
Get:4 http://cdn-fastly.deb.debian.org/debian testing/main amd64 Packages [7,916 kB]
Fetched 16.6 MB in 4s (4,411 kB/s)                           
Reading package lists... Done

Step 3: Install gcc-9 and gfortran-9

root@90521904fa86:/work# apt-get install gcc-9 gfortran-9
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
cpp-9 gcc-9-base libasan5 libatomic1 libcc1-0 libgcc-9-dev libgcc1 libgfortran-9-dev
libgfortran5 libgomp1 libitm1 liblsan0 libquadmath0 libstdc++6 libtsan0 libubsan1
Suggested packages:
gcc-9-locales gcc-9-multilib gcc-9-doc libgcc1-dbg libgomp1-dbg libitm1-dbg libatomic1-dbg
libasan5-dbg liblsan0-dbg libtsan0-dbg libubsan1-dbg libquadmath0-dbg gfortran-9-multilib
gfortran-9-doc libgfortran5-dbg libcoarrays-dev
The following NEW packages will be installed:
cpp-9 gcc-9 gfortran-9 libgcc-9-dev libgfortran-9-dev
The following packages will be upgraded:
gcc-9-base libasan5 libatomic1 libcc1-0 libgcc1 libgfortran5 libgomp1 libitm1 liblsan0
libquadmath0 libstdc++6 libtsan0 libubsan1
13 upgraded, 5 newly installed, 0 to remove and 71 not upgraded.
Need to get 35.6 MB of archives.
After this operation, 107 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libasan5 amd64 9.1.0-10 [390 kB]
Get:2 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libubsan1 amd64 9.1.0-10 [128 kB]
Get:3 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libtsan0 amd64 9.1.0-10 [295 kB]
Get:4 http://cdn-fastly.deb.debian.org/debian testing/main amd64 gcc-9-base amd64 9.1.0-10 [190 kB]
Get:5 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libstdc++6 amd64 9.1.0-10 [500 kB]
Get:6 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libquadmath0 amd64 9.1.0-10 [145 kB]
Get:7 http://cdn-fastly.deb.debian.org/debian testing/main amd64 liblsan0 amd64 9.1.0-10 [137 kB]
Get:8 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libitm1 amd64 9.1.0-10 [27.6 kB]
Get:9 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libgomp1 amd64 9.1.0-10 [88.1 kB]
Get:10 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libgfortran5 amd64 9.1.0-10 [633 kB]
Get:11 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libcc1-0 amd64 9.1.0-10 [47.7 kB]
Get:12 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libatomic1 amd64 9.1.0-10 [9,012 B]
Get:13 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libgcc1 amd64 1:9.1.0-10 [40.5 kB]
Get:14 http://cdn-fastly.deb.debian.org/debian testing/main amd64 cpp-9 amd64 9.1.0-10 [9,667 kB]
Get:15 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libgcc-9-dev amd64 9.1.0-10 [2,346 kB]
Get:16 http://cdn-fastly.deb.debian.org/debian testing/main amd64 gcc-9 amd64 9.1.0-10 [9,945 kB]
Get:17 http://cdn-fastly.deb.debian.org/debian testing/main amd64 libgfortran-9-dev amd64 9.1.0-10 [676 kB]
Get:18 http://cdn-fastly.deb.debian.org/debian testing/main amd64 gfortran-9 amd64 9.1.0-10 [10.4 MB]
Fetched 35.6 MB in 6s (6,216 kB/s)      
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 17787 files and directories currently installed.)
Preparing to unpack .../libasan5_9.1.0-10_amd64.deb ...
Unpacking libasan5:amd64 (9.1.0-10) over (9.1.0-8) ...
Preparing to unpack .../libubsan1_9.1.0-10_amd64.deb ...
Unpacking libubsan1:amd64 (9.1.0-10) over (9.1.0-8) ...
Preparing to unpack .../libtsan0_9.1.0-10_amd64.deb ...
Unpacking libtsan0:amd64 (9.1.0-10) over (9.1.0-8) ...
Preparing to unpack .../gcc-9-base_9.1.0-10_amd64.deb ...
Unpacking gcc-9-base:amd64 (9.1.0-10) over (9.1.0-8) ...
Setting up gcc-9-base:amd64 (9.1.0-10) ...
(Reading database ... 17787 files and directories currently installed.)
Preparing to unpack .../libstdc++6_9.1.0-10_amd64.deb ...
Unpacking libstdc++6:amd64 (9.1.0-10) over (9.1.0-8) ...
Setting up libstdc++6:amd64 (9.1.0-10) ...
(Reading database ... 17787 files and directories currently installed.)
Preparing to unpack .../0-libquadmath0_9.1.0-10_amd64.deb ...
Unpacking libquadmath0:amd64 (9.1.0-10) over (9.1.0-8) ...
Preparing to unpack .../1-liblsan0_9.1.0-10_amd64.deb ...
Unpacking liblsan0:amd64 (9.1.0-10) over (9.1.0-8) ...
Preparing to unpack .../2-libitm1_9.1.0-10_amd64.deb ...
Unpacking libitm1:amd64 (9.1.0-10) over (9.1.0-8) ...
Preparing to unpack .../3-libgomp1_9.1.0-10_amd64.deb ...
Unpacking libgomp1:amd64 (9.1.0-10) over (9.1.0-8) ...
Preparing to unpack .../4-libgfortran5_9.1.0-10_amd64.deb ...
Unpacking libgfortran5:amd64 (9.1.0-10) over (9.1.0-8) ...
Preparing to unpack .../5-libcc1-0_9.1.0-10_amd64.deb ...
Unpacking libcc1-0:amd64 (9.1.0-10) over (9.1.0-8) ...
Preparing to unpack .../6-libatomic1_9.1.0-10_amd64.deb ...
Unpacking libatomic1:amd64 (9.1.0-10) over (9.1.0-8) ...
Preparing to unpack .../7-libgcc1_1%3a9.1.0-10_amd64.deb ...
Unpacking libgcc1:amd64 (1:9.1.0-10) over (1:9.1.0-8) ...
Setting up libgcc1:amd64 (1:9.1.0-10) ...
Selecting previously unselected package cpp-9.
(Reading database ... 17787 files and directories currently installed.)
Preparing to unpack .../cpp-9_9.1.0-10_amd64.deb ...
Unpacking cpp-9 (9.1.0-10) ...
Selecting previously unselected package libgcc-9-dev:amd64.
Preparing to unpack .../libgcc-9-dev_9.1.0-10_amd64.deb ...
Unpacking libgcc-9-dev:amd64 (9.1.0-10) ...
Selecting previously unselected package gcc-9.
Preparing to unpack .../gcc-9_9.1.0-10_amd64.deb ...
Unpacking gcc-9 (9.1.0-10) ...
Selecting previously unselected package libgfortran-9-dev:amd64.
Preparing to unpack .../libgfortran-9-dev_9.1.0-10_amd64.deb ...
Unpacking libgfortran-9-dev:amd64 (9.1.0-10) ...
Selecting previously unselected package gfortran-9.
Preparing to unpack .../gfortran-9_9.1.0-10_amd64.deb ...
Unpacking gfortran-9 (9.1.0-10) ...
Setting up libgomp1:amd64 (9.1.0-10) ...
Setting up libasan5:amd64 (9.1.0-10) ...
Setting up libquadmath0:amd64 (9.1.0-10) ...
Setting up libatomic1:amd64 (9.1.0-10) ...
Setting up libgfortran5:amd64 (9.1.0-10) ...
Setting up libubsan1:amd64 (9.1.0-10) ...
Setting up cpp-9 (9.1.0-10) ...
Setting up libcc1-0:amd64 (9.1.0-10) ...
Setting up liblsan0:amd64 (9.1.0-10) ...
Setting up libitm1:amd64 (9.1.0-10) ...
Setting up libtsan0:amd64 (9.1.0-10) ...
Setting up libgcc-9-dev:amd64 (9.1.0-10) ...
Setting up gcc-9 (9.1.0-10) ...
Setting up libgfortran-9-dev:amd64 (9.1.0-10) ...
Setting up gfortran-9 (9.1.0-10) ...
Processing triggers for libc-bin (2.28-10) ...
root@90521904fa86:/work# pwd

Here filenames and versions reflect the Debian repositories as of today, August 5, 2019. While minor details may change at a future point in time, the key fact is that we get the components we desire via a single call as the Debian system has a well-honed package system

Step 4: Prepare Package

At this point Roger removed some dependencies from the package quantreg that he knew were not relevant to the debugging problem at hand.

Step 5: Set Compiler Flags

Next, set compiler flags as follows:

root@90521904fa86:/work# mkdir ~/.R; vi ~/.R/Makevars

adding the values

CC=gcc-9
FC=gfortran-9
F77=gfortran-9

to the file. Alternatively, one can find the settings of CC, FC, CXX, … in /etc/R/Makeconf (which for the Debian package is a softlink to R’s actual Makeconf) and alter them there.

Step 6: Install the Source Package

Now run

R CMD INSTALL quantreg_5.43.tar.gz

which uses the gfortran-9 compiler, and this version did reproduce the error initially reported by the CRAN maintainers.

Step 7: Debug!

With the tools in place, and the bug reproduces, it is (just!) a matter of finding the bug and fixing it.

And that concludes the tutorial.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sun, 09 Jun 2019

#22: Using Rocker and PPAs for Fun and Profit

Welcome to the 22nd post in the reasonably rational R recommendations series, or R⁴ for short.

This post premieres something new: a matching video in lightning talk style:

The topic is something we had mentioned a few times before in this r^4 blog series, for example in this post on finding deb packages as well as in this post on binary installations. Binaries rocks, where available, and Michael Rutter’s PPAs should really be known and used more widely. Hence the video and supporting slides.

/code/r4 | permanent link

Wed, 27 Mar 2019

#21: A Third and Final (?) Post on Stripping R Libraries

Welcome to the 21th post in the reasonably relevant R ramblings series, or R⁴ for short.

Back in August of 2017, we wrote two posts #9: Compating your Share Libraries and #10: Compacting your Shared Libraries, After The Build about “stripping” shared libraries. This involves removing auxiliary information (such as debug symbols and more) from the shared libraries which can greatly reduce the installed size (on suitable platforms – it mostly matters where I work, i.e. on Linux). As an illustration we included this chart:

Two items this week made me think of these posts. First was that a few days ago I noticed the following src/Makefile of the precrec package I was starting to use more:

# copied from https://github.com/vinecopulib/rvinecopulib
# strip debug symbols for smaller Linux binaries
strippedLib: $(SHLIB)
    if test -e "/usr/bin/strip" & test -e "/bin/uname" & [[ `uname` == "Linux" ]] ; \
        then /usr/bin/strip --strip-debug $(SHLIB); fi
.phony: strippedLib

And lo and behold, the quoted package rvinecopulib

has the same

CXX_STD      = CXX11
PKG_CPPFLAGS = -I../inst/include -pthread

# strip debug symbols for smaller Linux binaries
strippedLib: $(SHLIB)
    if test -e "/usr/bin/strip" & test -e "/bin/uname" & [[ `uname` == "Linux" ]] ; \
        then /usr/bin/strip --strip-debug $(SHLIB); fi
.phony: strippedLib

I was intrigued and googled a little. To my surprise I found one related reference … in a stone-old src/Makevars of mine in RcppClassic and probably written in 2007 or 2008. But more astonishing, the actual reference to the “phony target” trick is in … the #9 post from August 2017 referenced above. Doh. Younger me knew this, current me did not, and as those two packages didn’t reference my earlier use I had to re-find it. Oh well.

But the topic is still a very important one. The two blog posts show how to deal with this locally as a user and “consumer” of packages (as well as via the “phony trick” as a producer of packages) as well as an admin of a system with such packages. Personally I had been using this trick since August 2017 via my ~/.R/Makevars.

And we were still missing such a tool for the more general deployment. Well, until today, or rather, until R 3.6.0 comes out offically on April 26. The (excellent) R-devel Daily ‘NEWS’ feed – which itself was the topic of post #3: Follow R-devel – will likely show tomorrow something about this commit I spotted by following Winston’s mirror of the R-devel sources:

And indeed, we now can now do this with R-devel (rebuilt from today’s sources):

edd@rob:~$ RD CMD INSTALL --help | grep strip
      --strip           strip shared object(s)
edd@rob:~$

As a quick check, installing the (small, C-only) digest package without / with the --strip options gets us, respectively, 425kb and 123kb. So the ratios from the chart above should now be achievable directly from R CMD INSTALL --strip with R 3.6.0. (And for what it is worth, it still works with the older tricks mentioned above.)

And as occupying disk space with unused debugging symbols is wasteful, the new extension to R CMD INSTALL is most welcome.

Last but not least: It is this type of relentless small improvements to R, its innards, its installations and support by R Core that make this system for Programming with Data such an excellent tool and joy to use and follow. A big Thank You! to R Core for all they do, and do quietly yet relentlessly. It is immensely appreciated.

/code/r4 | permanent link

Thu, 14 Mar 2019

#20: Dependencies. Now with badges!

Welcome to post number twenty in the randomly redundant R rant series of posts, or R⁴ for short. It has been a little quiet since the previous post last June as we’ve been busy with other things but a few posts (or ideas at least) are queued.

Dependencies. We wrote about this a good year ago in post #17 which was (in part) tickled by the experience of installing one package … and getting a boatload of others pulled in. The topic and question of dependencies has seen a few posts over the year, and I won’t be able to do them all justice. Josh and I have been added a few links to the tinyverse.org page. The (currently) last one by Russ Cox titled Our Software Dependency Problem is particularly trenchant.

And just this week the topic came up in two different, and unrelated posts. First, in What I don’t like in you repo, Oleg Kovalov lists a brief but decent number of items by which a repository can be evaluated. And one is about [b]loated dependencies where he nails it with a quick When I see dozens of deps in the lock file, the first question which comes to my mind is: so, am I ready to fix any failures inside any of them? This is pretty close to what we have been saying around the tinyverse.

Second, in Beware the data science pin factory, Eric Colson brings an equation. Quoting from footnote 2: […] the number of relationships (r) grows as a function number of members (n) per this equation: r = (n^2-n) / 2. Granted, this was about human coordination and ideal team size. But let’s just run with it: For n=10, we get r=9 which is not so bad. For n=20, it is r=38. And for n=30 we are at r=87. You get the idea. “Big-Oh-N-squared”.

More dependencies means more edges between more nodes. Which eventually means more breakage.

Which gets us to announcement embedded in this post. A few months ago, in what still seems like a genuinely extra-clever weekend hack in an initial 100 or so lines, Edwin de Jonge put together a remarkable repo on GitLab. It combines Docker / Rocker via hourly cron jobs with deployment at netlify … giving us badges which visualize the direct as well as recursive dependencies of a package. All in about 100 lines, fully automated, autonomously running and deployed via CDN. Amazing work, for which we really need to praise him! So a big thanks to Edwin.

With these CRAN Dependency Badges being available, I have been adding them to my repos at GitHub over the last few months. As two quick examples you can see

Rcpp
RcppArmadillo

to get the idea. RcppArmadillo (or RcppEigen or many other packages) will always have one: Rcpp. But many widely-used packages such as data.table also get by with a count of zero. It is worth showing this – and the badge does just that! And I even sent a PR to the badger package: if you’re into this, you can have a badge made for your via badger::badge_depdencies(pkgname).

Otherwise, more details at Edwin’s repo and of course his actual tinyverse.netlify.com site hosting the badges. It’s easy as all other badges: reference the CRAN package, get a badge.

So if you buy into the idea that lightweight is the right weight then join us and show it via the dependency badges!

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sun, 24 Jun 2018

#19: Intel MKL in Debian / Ubuntu follow-up

Welcome to the (very brief) nineteenth post in the ruefully recalcitrant R reflections series of posts, or R⁴ for short.

About two months ago, in the most recent post in the series, #18, we provided a short tutorial about how to add the Intel Math Kernel Library to a Debian or Ubuntu system thanks to the wonderful apt tool -- and the prepackaged binaries by Intel. This made for a simple, reproducible, scriptable, and even reversible (!!) solution---which a few people seem to have appreciated. Good.

In the meantime, more good things happened. Debian maintainer Mo Zhou had posted this 'intent-to-package' bug report leading to this git repo on salsa and this set of packages currently in the 'NEW' package queue.

So stay tuned, "soon" (for various definitions of "soon") we should be able to directly get the MKL onto Debian systems via apt without needing Intel's repo. And in a release or two, Ubuntu should catch up. The fastest multithreaded BLAS and LAPACK for everybody, well-integrated and package. That said, it is still a monstrously large package so I mostly stick with the (truly open source rather than just 'gratis') OpenBLAS but hey, choice is good. And yes, technically these packages are 'outside' of Debian in the non-free section but they will be visible by almost all default configurations.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sun, 15 Apr 2018

#18: Adding Intel MKL easily via a simple script

Welcome to the eighteenth post in the rarely riveting R ramblings series of posts, or R⁴ for short.

The Intel Math Kernel Library (MKL) is well-know high(er) performance math library tailored for Intel CPUs offering best-in-class numerical performance on a number of low-level operations (BLAS, LAPACK, ...). They are not open source, used to be under commerical or research-only licenses --- but can now be had (still subject to license terms you should study) via apt-get (and even yum). This page describe the installation of the MKL (and other components) in detail (but stops short of the system integration aspect we show here).

Here we present one short script, discussed in detail below, to add the MKL to your Debian or Ubuntu system. Its main advantages are

clean standard code using package management tools;
additional steps to make it the the system default; and
with an option for clean removal leaning again on the package management system.

We put the script and a README.md largely identical to this writeup into this GitHub repo where issues, comments, questions, ... should be filed.

MKL for .deb-based systems: An easy recipe

This post describes how to easily install the Intel Math Kernel Library (MKL) on a Debian or Ubuntu system. Very good basic documentation is provided by Intel at their site. The discussion here is more narrow as it focusses just on the Math Kernel Library (MKL).

The tl;dr version: Use this script which contains the commands described here.

First Step: Set up apt

We download the GnuPG key first and add it to the keyring:

cd /tmp
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB

To add all Intel products we would run first command, but here we focus just on the MKL. The website above lists other suboptions (TBB, DAAL, MPI, ...)

## all products:
#wget https://apt.repos.intel.com/setup/intelproducts.list -O /etc/apt/sources.list.d/intelproducts.list

## just MKL
sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list'

We then update our lists of what is available in the repositories.

apt-get update

As a personal aside, I still use the awesome wajig frontend to dpkg, apt and more by Graham Williams (of rattle fame). Among other tricks, wajig keeps state and therefore "knows" what packages are new. Here, we see a lot:

edd@rob:/tmp$ wajig update
Hit:1 http://us.archive.ubuntu.com/ubuntu artful InRelease
Ign:2 http://dl.google.com/linux/chrome/deb stable InRelease
Hit:3 http://us.archive.ubuntu.com/ubuntu artful-updates InRelease
Hit:4 https://download.docker.com/linux/ubuntu artful InRelease
Hit:5 http://us.archive.ubuntu.com/ubuntu artful-backports InRelease
Ign:6 https://cloud.r-project.org/bin/linux/ubuntu artful/ InRelease
Hit:7 https://cloud.r-project.org/bin/linux/ubuntu artful/ Release
Hit:8 http://security.ubuntu.com/ubuntu artful-security InRelease
Hit:9 https://apt.repos.intel.com/mkl all InRelease
Hit:10 http://dl.google.com/linux/chrome/deb stable Release
Hit:12 https://packagecloud.io/slacktechnologies/slack/debian jessie InRelease
Reading package lists... Done
This is 367 up on the previous count with 367 new packages.
edd@rob:/tmp$ wajig new
Package                  Description
========================-===================================================
intel-mkl-gnu-f-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-gnu-f-174   Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-cluster-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-gnu-c-196      Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-cluster-c-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-core-f-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-cluster-rt-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-gnu-239        Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-openmp-l-ps-libs-32bit-jp-174 OpenMP for Intel(R) Compilers 17.0 Update 2 for Linux*
intel-mkl-doc-ps-2018    Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-pgi-rt-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-ss-tbb-32bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-mic-cluster-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-openmp-l-ps-libs-jp-196 OpenMP for Intel(R) Compilers 17.0 Update 4 for Linux*
intel-mkl-ps-mic-rt-174  Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-openmp-l-ps-libs-jp-239 OpenMP for Intel(R) Compilers 17.0 Update 5 for Linux*
intel-mkl-common-f-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-f95-mic-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-common-64bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-f95-common-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-psxe-common-2018.2-046 Intel(R) Parallel Studio XE 2018 Update 2 for Linux*
intel-mkl-ps-mic-cluster-rt-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-cluster-64bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-f95-32bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-cluster-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-gnu-rt-196     Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-tbb-libs-2018.0-128 Intel(R) Threading Building Blocks 2018 for Linux*
intel-comp-l-all-vars-196 Intel(R) Compilers 17.0 Update 4 for Linux*
intel-mkl-common-ps-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-pgi-f-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-f95-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-gnu-rt-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-openmp-18.0.0-128  OpenMP for Intel(R) Compilers 18.0 for Linux*
intel-mkl-common-c-32bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-core-ps-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-f95-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-f95-mic-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-common-c-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-cluster-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-doc-f-jp    Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-common-f-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-32bit-2018.1-038 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-common-32bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-openmp-l-all-196   OpenMP for Intel(R) Compilers 17.0 Update 4 for Linux*
intel-mkl-pgi-rt-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-common-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-comp-nomcu-vars-18.0.0-128 Intel(R) Compilers 18.0 for Linux*
intel-mkl-common-c-32bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-common-32bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-gnu-f-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-common-c-64bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-gnu-32bit-174  Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-common-ps-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-cluster-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-gnu-f-rt-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-cluster-f-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-common-c-174   Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-ss-tbb-196  Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-tbb-libs-32bit-2018.0-128 Intel(R) Threading Building Blocks 2018 for Linux*
intel-mkl-gnu-c-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-tbb-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-tbb-libs-2018.1-163 Intel(R) Threading Building Blocks 2018 Update 1 for Linux*
intel-mkl-ps-common-f-32bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-ss-tbb-rt-32bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-196            Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-pgi-196     Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-psxe-2018.2-046 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-doc-c          Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-f95-196     Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-cluster-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-tbb-libs-174       Intel(R) Threading Building Blocks 2017 Update 4 for Linux*
intel-comp-l-all-vars-174 Intel(R) Compilers 17.0 Update 2 for Linux*
intel-mkl-gnu-f-rt-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-gnu-rt-174     Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-openmp-l-ps-libs-32bit-jp-196 OpenMP for Intel(R) Compilers 17.0 Update 4 for Linux*
intel-mkl-gnu-f-rt-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-openmp-18.0.1-163  OpenMP for Intel(R) Compilers 18.0 Update 1 for Linux*
intel-mkl-ps-cluster-64bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-core-c-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-pgi-rt-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-2018.2-046     Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-gnu-rt-239     Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-gnu-rt-32bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-comp-l-all-vars-18.0.0-128 Intel(R) Compilers 18.0 for Linux*
intel-mkl-ps-common-jp-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-openmp-32bit-18.0.0-128 OpenMP for Intel(R) Compilers 18.0 for Linux*
intel-mkl-f95-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-core-ps-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-gnu-f-32bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-tbb-mic-rt-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-core-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-psxe-2018.1-038 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-gnu-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-mic-174     Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-64bit-2017.4-061 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-f95-32bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-mic-rt-jp-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-gnu-196        Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-psxe-common-doc-2018 Intel(R) Parallel Studio XE 2018 Update 2 for Linux*
intel-mkl-ps-tbb-mic-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-core-c-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-core-c-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-cluster-rt-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-rt-jp-32bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-core-c-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-ss-tbb-rt-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-core-f-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-psxe-050       Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-64bit-2018.2-046 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-tbb-rt-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-ss-tbb-rt-32bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-doc-f       Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-gnu-c-174      Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-f95-common-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-rt-jp-196   Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-gnu-f-rt-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-pgi-f-196   Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-tbb-libs-32bit-2018.1-163 Intel(R) Threading Building Blocks 2018 Update 1 for Linux*
intel-mkl-common-c-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-gnu-rt-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-64bit-2018.0-033 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-mic-c-239   Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-ss-tbb-239  Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-common-64bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-openmp-32bit-18.0.2-199 OpenMP for Intel(R) Compilers 18.0 Update 2 for Linux*
intel-mkl-ps-rt-jp-32bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-gnu-f-rt-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-common-jp-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-tbb-mic-rt-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-psxe-common-061    Intel(R) Parallel Studio XE 2017 Update 5 for Linux*
intel-mkl-gnu-rt-32bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-common-f-32bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-mic-cluster-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-common-f-32bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-tbb-libs-196       Intel(R) Threading Building Blocks 2017 Update 6 for Linux*
intel-mkl-cluster-rt-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-cluster-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-pgi-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-ss-tbb-rt-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-openmp-l-all-174   OpenMP for Intel(R) Compilers 17.0 Update 2 for Linux*
intel-mkl-tbb-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-pgi-c-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-64bit-2018.1-038 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-f95-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-gnu-c-32bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-239            Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-pgi-rt-174  Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-cluster-f-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-f95-common-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-common-f-64bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-cluster-common-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-cluster-f-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-common-jp-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-openmp-l-all-32bit-196 OpenMP for Intel(R) Compilers 17.0 Update 4 for Linux*
intel-mkl-tbb-rt-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-psxe-common-056    Intel(R) Parallel Studio XE 2017 Update 4 for Linux*
intel-mkl-32bit-2018.0-033 Intel(R) Math Kernel Library 2018 for Linux*
intel-comp-l-all-vars-18.0.2-199 Intel(R) Compilers 18.0 Update 2 for Linux*
intel-mkl-common-ps-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-core-rt-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-pgi-c-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-common-c-ps-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-gnu-f-rt-32bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-f95-common-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-openmp-l-all-239   OpenMP for Intel(R) Compilers 17.0 Update 5 for Linux*
intel-mkl-174            Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-gnu-f-rt-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-tbb-libs-239       Intel(R) Threading Building Blocks 2017 Update 8 for Linux*
intel-mkl-common-f-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-common-f-64bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-cluster-common-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-comp-nomcu-vars-18.0.2-199 Intel(R) Compilers 18.0 Update 2 for Linux*
intel-mkl-32bit-196      Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-gnu-f-196   Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-core-ps-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-common-196     Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-rt-174         Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-core-rt-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-common-f-64bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-cluster-rt-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-cluster-c-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-gnu-rt-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-doc         Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-rt-32bit-239   Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-openmp-l-ps-libs-174 OpenMP for Intel(R) Compilers 17.0 Update 2 for Linux*
intel-mkl-ps-cluster-rt-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-common-64bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-gnu-c-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-pgi-c-196   Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-core-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-gnu-f-239   Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-pgi-f-239   Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-gnu-f-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-core-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-common-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-pgi-c-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-rt-jp-32bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-common-c-64bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-gnu-f-rt-32bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-pgi-c-174   Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-core-f-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-32bit-239      Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-rt-jp-174   Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-mic-239     Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-tbb-libs-2018.2-199 Intel(R) Threading Building Blocks 2018 Update 2 for Linux*
intel-mkl-f95-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-openmp-l-ps-libs-239 OpenMP for Intel(R) Compilers 17.0 Update 5 for Linux*
intel-mkl-rt-239         Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-core-c-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-openmp-l-all-32bit-174 OpenMP for Intel(R) Compilers 17.0 Update 2 for Linux*
intel-mkl-ps-pgi-239     Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-gnu-f-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-gnu-f-rt-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-tbb-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-rt-32bit-196   Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-mic-196     Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-gnu-f-rt-32bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-gnu-32bit-239  Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-common-174  Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-common-c-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-gnu-f-rt-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-f95-common-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-mic-f-239   Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-common-196  Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-core-ps-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-cluster-64bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-64bit-2017.3-056 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-ss-tbb-32bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-32bit-2017.4-061 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-tbb-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-64bit-2017.2-050 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-mic-rt-196  Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-cluster-c-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-gnu-c-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-gnu-f-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-gnu-c-32bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-gnu-rt-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-core-rt-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-common-239     Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-2017.3-056     Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-tbb-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-pgi-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-pgi-f-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-common-c-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-tbb-rt-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-openmp-l-all-32bit-239 OpenMP for Intel(R) Compilers 17.0 Update 5 for Linux*
intel-mkl-rt-196         Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-cluster-f-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-2017.4-061     Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-common-c-239   Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-gnu-c-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-psxe-common-doc    Intel(R) Parallel Studio XE 2017 Update 5 for Linux*
intel-tbb-libs-32bit-2018.2-199 Intel(R) Threading Building Blocks 2018 Update 2 for Linux*
intel-mkl-2017.2-050     Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-tbb-mic-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-2018.1-038     Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-gnu-c-32bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-core-ps-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-tbb-mic-rt-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-tbb-rt-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-cluster-f-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-psxe-061       Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-ss-tbb-32bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-mic-rt-jp-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-common-c-ps-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-doc-jp      Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-gnu-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-core-rt-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-common-f-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-gnu-c-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-cluster-rt-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-mic-rt-239  Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-openmp-l-ps-libs-196 OpenMP for Intel(R) Compilers 17.0 Update 4 for Linux*
intel-mkl-common-c-196   Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-gnu-32bit-196  Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-openmp-18.0.2-199  OpenMP for Intel(R) Compilers 18.0 Update 2 for Linux*
intel-mkl-ps-common-32bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-gnu-rt-32bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-openmp-32bit-18.0.1-163 OpenMP for Intel(R) Compilers 18.0 Update 1 for Linux*
intel-mkl-ps-pgi-174     Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-gnu-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-comp-l-all-vars-239 Intel(R) Compilers 17.0 Update 5 for Linux*
intel-mkl-ps-mic-c-196   Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-core-f-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-ss-tbb-174  Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-gnu-f-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-mic-c-174   Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-mic-rt-jp-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-psxe-common-2018.0-033 Intel(R) Parallel Studio XE 2018 for Linux*
intel-mkl-ps-f95-mic-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-common-c-64bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-psxe-056       Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-gnu-rt-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-core-c-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-common-c-32bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-comp-l-all-vars-18.0.1-163 Intel(R) Compilers 18.0 Update 1 for Linux*
intel-mkl-psxe-2018.0-033 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-f95-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-openmp-l-ps-libs-jp-174 OpenMP for Intel(R) Compilers 17.0 Update 2 for Linux*
intel-mkl-tbb-rt-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-mic-f-174   Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-2018.0-033     Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-ps-f95-239     Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-doc            Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-common-c-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-common-f-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-gnu-f-32bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-cluster-c-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-common-c-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-tbb-mic-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-core-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-sta-common-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-core-32bit-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-core-f-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-32bit-2018.2-046 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-mic-f-196   Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-gnu-c-239      Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-doc-c-jp    Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-gnu-rt-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-doc-2018       Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-pgi-c-239   Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-core-rt-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-common-239  Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-f95-common-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-gnu-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-cluster-c-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-mic-cluster-rt-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-cluster-f-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-pgi-f-174   Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-common-c-ps-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-cluster-common-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-32bit-2017.3-056 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-core-rt-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-gnu-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-eula-174       Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-ss-tbb-rt-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-pgi-rt-196  Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-gnu-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-psxe-common-2018.1-038 Intel(R) Parallel Studio XE 2018 Update 1 for Linux*
intel-mkl-pgi-f-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-core-ps-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-common-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-rt-jp-239   Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-ps-pgi-rt-239  Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-32bit-2017.2-050 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-core-f-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-pgi-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-tbb-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-comp-nomcu-vars-18.0.1-163 Intel(R) Compilers 18.0 Update 1 for Linux*
intel-mkl-common-f-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-tbb-rt-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-common-174     Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-gnu-c-32bit-2018.2-199 Intel(R) Math Kernel Library 2018 Update 2 for Linux*
intel-mkl-ps-mic-cluster-rt-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-f95-174     Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-core-2018.1-163 Intel(R) Math Kernel Library 2018 Update 1 for Linux*
intel-mkl-ps-gnu-f-32bit-196 Intel(R) Math Kernel Library 2017 Update 3 for Linux*
intel-mkl-ps-f95-32bit-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-ps-mic-cluster-174 Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-psxe-common-050    Intel(R) Parallel Studio XE 2017 Update 2 for Linux*
intel-mkl-cluster-c-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-rt-32bit-174   Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-mkl-32bit-174      Intel(R) Math Kernel Library 2017 Update 2 for Linux*
intel-openmp-l-ps-libs-32bit-jp-239 OpenMP for Intel(R) Compilers 17.0 Update 5 for Linux*
intel-mkl-ps-ss-tbb-rt-32bit-239 Intel(R) Math Kernel Library 2017 Update 4 for Linux*
intel-mkl-gnu-f-rt-32bit-2018.0-128 Intel(R) Math Kernel Library 2018 for Linux*
intel-mkl-gnu-174        Intel(R) Math Kernel Library 2017 Update 2 for Linux*
edd@rob:/tmp$

Install MKL

Now that we have everything set up, installing the MKL is as simple as:

apt-get install intel-mkl-64bit-2018.2-046

This picks the 64-bit only variant of the (currently) most recent builds.

There is a slight cost: a 500mb download of 39 packages which install to 1.9 gb! Other than that it is easy: one command! Compare that with the days of yore when we fetched shar archives of NETLIB...

Integrate MKL

One the key advantages of a Debian or Ubuntu system is the overall integration providing a raft of useful features. One of these is the seamless and automatic selection of alternatives. By declaring a particular set of BLAS and LAPACK libraries the default, all application linked against this interface will use the default. Better still, users can switch between these as well.

So here we can make the MKL default for BLAS and LAPACK:

## update alternatives
update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so     \
                    libblas.so-x86_64-linux-gnu      /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so.3   \
                    libblas.so.3-x86_64-linux-gnu    /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so   \
                    liblapack.so-x86_64-linux-gnu    /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so.3 \
                    liblapack.so.3-x86_64-linux-gnu  /opt/intel/mkl/lib/intel64/libmkl_rt.so 50

Next, we have to tell the dyanmic linker about two directories use by the MKL, and have it update its cache:

echo "/opt/intel/lib/intel64"     >  /etc/ld.so.conf.d/mkl.conf
echo "/opt/intel/mkl/lib/intel64" >> /etc/ld.so.conf.d/mkl.conf
ldconfig

Use the MKL

Now the MKL is 'known' and the default. If we start R, its sessionInfo() shows the MKL:

# Matrix products: default                            
# BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

Benchmarks

# Vanilla r-base Rocker with default reference BLAS 
> n <- 1e3 ; X <- matrix(rnorm(n*n),n,n);  system.time(svd(X)) 
   user  system elapsed 
  2.239   0.004   2.266 
> 

# OpenBlas added to r-base Rocker
>  n <- 1e3 ; X <- matrix(rnorm(n*n),n,n);  system.time(svd(X)) 
   user  system elapsed 
  1.367   2.297   0.353 
> 

# MKL added to r-base Rocker
> n <- 1e3 ; X <- matrix(rnorm(n*n),n,n)  
> system.time(svd(X))                               
   user  system elapsed                             
  1.772   0.056   0.350                             
>

So just R (with reference BLAS) is slow. (Using Docker is done here to have clean comparisons while not altering the outer host system; impact of running Docker on Linux should be minimal.) Adding OpenBLAS helps quite a bit already by offering multi-core processing -- the, and MKL does not yet improve materially over OpenBLAS. Now, this of course was not any serious benchmarking---we just ran one SVD. More to do as time permits...

Removal, if needed

Another rather nice benefit of the package management is that clean removal is also possible:

root@c9f8062fbd93:/tmp# apt-get autoremove intel-mkl-64bit-2018.2-046
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be REMOVED:
  intel-comp-l-all-vars-18.0.2-199 intel-comp-nomcu-vars-18.0.2-199 intel-mkl-64bit-2018.2-046 
  intel-mkl-cluster-2018.2-199 intel-mkl-cluster-c-2018.2-199 intel-mkl-cluster-common-2018.2-199 
  intel-mkl-cluster-f-2018.2-199 intel-mkl-cluster-rt-2018.2-199 intel-mkl-common-2018.2-199 
  intel-mkl-common-c-2018.2-199 intel-mkl-common-c-ps-2018.2-199 intel-mkl-common-f-2018.2-199 
  intel-mkl-common-ps-2018.2-199 intel-mkl-core-2018.2-199 intel-mkl-core-c-2018.2-199 
  intel-mkl-core-f-2018.2-199 intel-mkl-core-ps-2018.2-199 intel-mkl-core-rt-2018.2-199 
  intel-mkl-doc-2018 intel-mkl-doc-ps-2018 intel-mkl-f95-2018.2-199 intel-mkl-f95-common-2018.2-199 
  intel-mkl-gnu-2018.2-199 intel-mkl-gnu-c-2018.2-199 intel-mkl-gnu-f-2018.2-199 intel-mkl-gnu-f-rt-2018.2-199 
  intel-mkl-gnu-rt-2018.2-199 intel-mkl-pgi-2018.2-199 intel-mkl-pgi-c-2018.2-199 intel-mkl-pgi-f-2018.2-199 
  intel-mkl-pgi-rt-2018.2-199 intel-mkl-psxe-2018.2-046 intel-mkl-tbb-2018.2-199 intel-mkl-tbb-rt-2018.2-199 
  intel-openmp-18.0.2-199 intel-psxe-common-2018.2-046 intel-psxe-common-doc-2018 intel-tbb-libs-2018.2-199 
  intel-tbb-libs-32bit-2018.2-199 libisl15
0 upgraded, 0 newly installed, 40 to remove and 0 not upgraded.
After this operation, 1,904 kB disk space will be freed.
Do you want to continue? [Y/n] n                    
Abort.                                              
root@c9f8062fbd93:/tmp#

where we said 'no' just to illustrate the option.

Summary

Package management systems are fabulous. Kudos to Intel for supporting apt (and also yum in case you are on an rpm-based system). We can install the MKL with just a few commands (which we regrouped in this script).

The MKL has a serious footprint with an installed size of just under 2gb. But for those doing extended amounts of numerical analysis, installing this library may well be worth it.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Wed, 28 Feb 2018

#17: Dependencies.

Dependencies are invitations for other people to break your package.
-- Josh Ulrich, private communication

Welcome to the seventeenth post in the relentlessly random R ravings series of posts, or R⁴ for short.

Dependencies. A truly loaded topic.

As R users, we are spoiled. Early in the history of R, Kurt Hornik and Friedrich Leisch built support for packages right into R, and started the Comprehensive R Archive Network (CRAN). And R and CRAN had a fantastic run with. Roughly twenty years later, we are looking at over 12,000 packages which can (generally) be installed with absolute ease and no suprises. No other (relevant) open source language has anything of comparable rigour and quality. This is a big deal.

And coding practices evolved and changed to play to this advantage. Packages are a near-unanimous recommendation, use of the install.packages() and update.packages() tooling is nearly universal, and most R users learned to their advantage to group code into interdependent packages. Obvious advantages are versioning and snap-shotting, attached documentation in the form of help pages and vignettes, unit testing, and of course continuous integration as a side effect of the package build system.

But the notion of 'oh, let me just build another package and add it to the pool of packages' can get carried away. A recent example I had was the work on the prrd package for parallel recursive dependency testing --- coincidentally, created entirely to allow for easier voluntary tests I do on reverse dependencies for the packages I maintain. It uses a job queue for which I relied on the liteq package by Gabor which does the job: enqueue jobs, and reliably dequeue them (also in a parallel fashion) and more. It looks light enough:

R> tools::package_dependencies(package="liteq", recursive=FALSE, db=AP)$liteq
[1] "assertthat" "DBI"        "rappdirs"   "RSQLite"   
R>

Two dependencies because it uses an internal SQLite database, one for internal tooling and one for configuration.

All good then? Not so fast. The devil here is the very innocuous and versatile RSQLite package because when we look at fully recursive dependencies all hell breaks loose:

R> tools::package_dependencies(package="liteq", recursive=TRUE, db=AP)$liteq
 [1] "assertthat" "DBI"        "rappdirs"   "RSQLite"    "tools"     
 [6] "methods"    "bit64"      "blob"       "memoise"    "pkgconfig" 
[11] "Rcpp"       "BH"         "plogr"      "bit"        "utils"     
[16] "stats"      "tibble"     "digest"     "cli"        "crayon"    
[21] "pillar"     "rlang"      "grDevices"  "utf8"      
R>
R> tools::package_dependencies(package="RSQLite", recursive=TRUE, db=AP)$RSQLite
 [1] "bit64"      "blob"       "DBI"        "memoise"    "methods"   
 [6] "pkgconfig"  "Rcpp"       "BH"         "plogr"      "bit"       
[11] "utils"      "stats"      "tibble"     "digest"     "cli"       
[16] "crayon"     "pillar"     "rlang"      "assertthat" "grDevices" 
[21] "utf8"       "tools"     
R>

Now we went from four to twenty-four, due to the twenty-two dependencies pulled in by RSQLite.

There, my dear friend, lies madness. The moment one of these packages breaks we get potential side effects. And this is no laughing matter. Here is a tweet from Kieran posted days before a book deadline of his when he was forced to roll a CRAN package back because it broke his entire setup. (The original tweet has by now been deleted; why people do that to their entire tweet histories is somewhat I fail to comprehened too; in any case the screenshot is from a private discussion I had with a few like-minded folks over slack.)

That illustrates the quote by Josh at the top. As I too have "production code" (well, CRANberries for one relies on it), I was interested to see if we could easily amend RSQLite. And yes, we can. A quick fork and few commits later, we have something we could call 'RSQLighter' as it reduces the dependencies quite a bit:

R> IP <- installed.packages()   # using my installed mod'ed version
R> tools::package_dependencies(package="RSQLite", recursive=TRUE, db=IP)$RSQLite
 [1] "bit64"     "DBI"       "methods"   "Rcpp"      "BH"        "bit"      
 [7] "utils"     "stats"     "grDevices" "graphics" 
R>

That is less than half. I have not proceeded with the fork because I do not believe in needlessly splitting codebases. But this could be a viable candidate for an alternate or shadow repository with more minimal and hence more robust dependencies. Or, as Josh calls, the tinyverse.

Another maddening aspect of dependencies is the ruthless application of what we could jokingly call Metcalf's Law: the likelihood of breakage does of course increase with the number edges in the dependency graph. A nice illustration is this post by Jenny trying to rationalize why one of the 87 (as of today) tidyverse packages has now state "ORPHANED" at CRAN:

An invitation for other people to break your code. Well put indeed. Or to put rocks up your path.

But things are not all that dire. Most folks appear to understand the issue, some even do something about it. The DBI and RMySQL packages have saner strict dependencies, maybe one day things will improve for RMariaDB and RSQLite too:

R> tools::package_dependencies(package=c("DBI", "RMySQL", "RMariaDB"), recursive=TRUE, db=AP)
$DBI
[1] "methods"

$RMySQL
[1] "DBI"     "methods"

$RMariaDB
 [1] "bit64"     "DBI"       "hms"       "methods"   "Rcpp"      "BH"       
 [7] "plogr"     "bit"       "utils"     "stats"     "pkgconfig" "rlang"    

R>

And to be clear, I do not believe in giving up and using everything via docker, or virtualenvs, or packrat, or ... A well-honed dependency system is wonderful and the right resource to get code deployed and updated. But it required buy-in from everyone involved, and an understanding of the possible trade-offs. I think we can, and will, do better going forward.

Or else, there will always be the tinyverse ...

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Tue, 06 Feb 2018

#16: Complaining Works.

Welcome to the sixteenth post in the relatively random R related series of posts, or R⁴ for short.

This one will likely be brief. But it is one post I have been meaning to get out for a little while---yet did not get around to. The meta point I am trying to make today is that despite overwhelming odds that may indicate otherwise, it can actually pay off to have voice or platform. Or, as I jokingly called it via slack to Josh: complaining works. Hence the title.

There are two things I have harped about over the years (with respect to R), and very recently (and almost against all odds) both changed for the better.

First, Rscript. There was a of course little bit of pride and personal ownership involved as Jeff Horner and I got the first command-line interface to R out: littler with its r tool. As I recall, we beat the release of Rscript (which comes with R) narrowly by a few months. In any event, the bigger issue remained that Rscript would always fail on code requiring the methods package (also in base R). The given reason was load time and performance were slower due to the nature of S4 classes. But as I once blogged in jest, littler is still faster at doing nothing even though it always loaded methods---as one wants code to behave as it does under an interactive R session. And over the years I must have answered this question of "Why does Rscript fail" half a dozen times on the mailing lists and on StackOverflow. But now, thanks to Luke Tierney who put the change in in early January, R 3.5.0 we will have an Rscript that behaves like R and includes methods by default (unless instructed otherwise, of course). Nice.

Second, an issue I bemoaned repeatedly concerned the (at least to my reading) inconsistent treatment of Suggests: and Depends: in the Writing R Extensions manual on the one hand, and how CRAN did (or, rather, did not) enforce this. In particular, and as I argued not quite a year ago, Suggests != Depends. In particular, tests should not just fail if a suggested package was not present (in the clean-room setup used for tests). Rather, one should make the tests conditional on this optional package being present. And this too now seems to be tested as I was among the recipients of one of those emails from CRAN requiring a change. This one was clear in its title and mission: CRAN packages not using microbenchmark conditionally. Which we fixed. But the bigger issue is that CRAN now seems to agree that Suggests != Depends.

So the main message now is clear: It seems to pay to comment on the mailing lists, or to write a blog post, or do something else to get one's reasoning out. Change may not be immediate, but it may still come. So to paraphrase Michael Pollan's beautiful dictum about food: "Eat food. Not too much. Mostly plants.", we could now say: "Report inconsistencies. Present evidence. Be patient."

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sun, 21 Jan 2018

#15: Tidyverse and data.table, sitting side by side ... (Part 1)

Welcome to the fifteenth post in the rarely rational R rambling series, or R⁴ for short. There are two posts I have been meaning to get out for a bit, and hope to get to shortly---but in the meantime we are going start something else.

Another longer-running idea I had was to present some simple application cases with (one or more) side-by-side code comparisons. Why? Well at times it feels like R, and the R community, are being split. You're either with one (increasingly "religious" in their defense of their deemed-superior approach) side, or the other. And that is of course utter nonsense. It's all R after all.

Programming, just like other fields using engineering methods and thinking, is about making choices, and trading off between certain aspects. A simple example is the fairly well-known trade-off between memory use and speed: think e.g. of a hash map allowing for faster lookup at the cost of some more memory. Generally speaking, solutions are rarely limited to just one way, or just one approach. So if pays off to know your tools, and choose wisely among all available options. Having choices is having options, and those tend to have non-negative premiums to take advantage off. Locking yourself into one and just one paradigm can never be better.

In that spirit, I want to (eventually) show a few simple comparisons of code being done two distinct ways.

One obvious first candidate for this is the gunsales repository with some R code which backs an earlier NY Times article. I got involved for a similar reason, and updated the code from its initial form. Then again, this project also helped motivate what we did later with the x13binary package which permits automated installation of the X13-ARIMA-SEATS binary to support Christoph's excellent seasonal CRAN package (and website) for which we now have a forthcoming JSS paper. But the actual code example is not that interesting / a bit further off the mainstream because of the more specialised seasonal ARIMA modeling.

But then this week I found a much simpler and shorter example, and quickly converted its code. The code comes from the inaugural datascience 1 lesson at the Crosstab, a fabulous site by G. Elliot Morris (who may be the highest-energy undergrad I have come across lately) focusssed on political polling, forecasts, and election outcomes. Lesson 1 is a simple introduction, and averages some polls of the 2016 US Presidential Election.

Complete Code using Approach "TV"

Elliot does a fine job walking the reader through his code so I will be brief and simply quote it in one piece:


## Getting the polls

library(readr)
polls_2016 <- read_tsv(url("http://elections.huffingtonpost.com/pollster/api/v2/questions/16-US-Pres-GE%20TrumpvClinton/poll-responses-clean.tsv"))

## Wrangling the polls

library(dplyr)
polls_2016 <- polls_2016 %>%
    filter(sample_subpopulation %in% c("Adults","Likely Voters","Registered Voters"))
library(lubridate)
polls_2016 <- polls_2016 %>%
    mutate(end_date = ymd(end_date))
polls_2016 <- polls_2016 %>%
    right_join(data.frame(end_date = seq.Date(min(polls_2016$end_date),
                                              max(polls_2016$end_date), by="days")))

## Average the polls

polls_2016 <- polls_2016 %>%
    group_by(end_date) %>%
    summarise(Clinton = mean(Clinton),
              Trump = mean(Trump))

library(zoo)
rolling_average <- polls_2016 %>%
    mutate(Clinton.Margin = Clinton-Trump,
           Clinton.Avg =  rollapply(Clinton.Margin,width=14,
                                    FUN=function(x){mean(x, na.rm=TRUE)},
                                    by=1, partial=TRUE, fill=NA, align="right"))

library(ggplot2)
ggplot(rolling_average)+
  geom_line(aes(x=end_date,y=Clinton.Avg),col="blue") +
  geom_point(aes(x=end_date,y=Clinton.Margin))

It uses five packages to i) read some data off them interwebs, ii) then filters / subsets / modifies it leading to a right (outer) join with itself before iv) averaging per-day polls first and then creates rolling averages over 14 days before v) plotting. Several standard verbs are used: filter(), mutate(), right_join(), group_by(), and summarise(). One non-verse function is rollapply() which comes from zoo, a popular package for time-series data.

Complete Code using Approach "DT"

As I will show below, we can do the same with fewer packages as data.table covers the reading, slicing/dicing and time conversion. We still need zoo for its rollapply() and of course the same plotting code:


## Getting the polls

library(data.table)
pollsDT <- fread("http://elections.huffingtonpost.com/pollster/api/v2/questions/16-US-Pres-GE%20TrumpvClinton/poll-responses-clean.tsv")

## Wrangling the polls

pollsDT <- pollsDT[sample_subpopulation %in% c("Adults","Likely Voters","Registered Voters"), ]
pollsDT[, end_date := as.IDate(end_date)]
pollsDT <- pollsDT[ data.table(end_date = seq(min(pollsDT[,end_date]),
                                              max(pollsDT[,end_date]), by="days")), on="end_date"]

## Average the polls

library(zoo)
pollsDT <- pollsDT[, .(Clinton=mean(Clinton), Trump=mean(Trump)), by=end_date]
pollsDT[, Clinton.Margin := Clinton-Trump]
pollsDT[, Clinton.Avg := rollapply(Clinton.Margin, width=14,
                                   FUN=function(x){mean(x, na.rm=TRUE)},
                                   by=1, partial=TRUE, fill=NA, align="right")]

library(ggplot2)
ggplot(pollsDT) +
    geom_line(aes(x=end_date,y=Clinton.Avg),col="blue") +
    geom_point(aes(x=end_date,y=Clinton.Margin))

This uses several of the components of data.table which are often called [i, j, by=...]. Row are selected (i), columns are either modified (via := assignment) or summarised (via =), and grouping is undertaken by by=.... The outer join is done by having a data.table object indexed by another, and is pretty standard too. That allows us to do all transformations in three lines. We then create per-day average by grouping by day, compute the margin and construct its rolling average as before. The resulting chart is, unsurprisingly, the same.

Benchmark Reading

We can looking how the two approaches do on getting data read into our session. For simplicity, we will read a local file to keep the (fixed) download aspect out of it:

R> url <- "http://elections.huffingtonpost.com/pollster/api/v2/questions/16-US-Pres-GE%20TrumpvClinton/poll-responses-clean.tsv"
R> download.file(url, destfile=file, quiet=TRUE)
R> file <- "/tmp/poll-responses-clean.tsv"
R> res <- microbenchmark(tidy=suppressMessages(readr::read_tsv(file)),
+                       dt=data.table::fread(file, showProgress=FALSE))
R> res
Unit: milliseconds
 expr     min      lq    mean  median      uq      max neval
 tidy 6.67777 6.83458 7.13434 6.98484 7.25831  9.27452   100
   dt 1.98890 2.04457 2.37916 2.08261 2.14040 28.86885   100
R>

That is a clear relative difference, though the absolute amount of time is not that relevant for such a small (demo) dataset.

Benchmark Processing

We can also look at the processing part:

R> rdin <- suppressMessages(readr::read_tsv(file))
R> dtin <- data.table::fread(file, showProgress=FALSE)
R> 
R> library(dplyr)
R> library(lubridate)
R> library(zoo)
R> 
R> transformTV <- function(polls_2016=rdin) {
+     polls_2016 <- polls_2016 %>%
+         filter(sample_subpopulation %in% c("Adults","Likely Voters","Registered Voters"))
+     polls_2016 <- polls_2016 %>%
+         mutate(end_date = ymd(end_date))
+     polls_2016 <- polls_2016 %>%
+         right_join(data.frame(end_date = seq.Date(min(polls_2016$end_date), 
+                                                   max(polls_2016$end_date), by="days")))
+     polls_2016 <- polls_2016 %>%
+         group_by(end_date) %>%
+         summarise(Clinton = mean(Clinton),
+                   Trump = mean(Trump))
+ 
+     rolling_average <- polls_2016 %>%
+         mutate(Clinton.Margin = Clinton-Trump,
+                Clinton.Avg =  rollapply(Clinton.Margin,width=14,
+                                         FUN=function(x){mean(x, na.rm=TRUE)}, 
+                                         by=1, partial=TRUE, fill=NA, align="right"))
+ }
R> 
R> transformDT <- function(dtin) {
+     pollsDT <- copy(dtin) ## extra work to protect from reference semantics for benchmark
+     pollsDT <- pollsDT[sample_subpopulation %in% c("Adults","Likely Voters","Registered Voters"), ]
+     pollsDT[, end_date := as.IDate(end_date)]
+     pollsDT <- pollsDT[ data.table(end_date = seq(min(pollsDT[,end_date]), 
+                                                   max(pollsDT[,end_date]), by="days")), on="end_date"]
+     pollsDT <- pollsDT[, .(Clinton=mean(Clinton), Trump=mean(Trump)), 
+                        by=end_date][, Clinton.Margin := Clinton-Trump]
+     pollsDT[, Clinton.Avg := rollapply(Clinton.Margin, width=14,
+                                        FUN=function(x){mean(x, na.rm=TRUE)}, 
+                                        by=1, partial=TRUE, fill=NA, align="right")]
+ }
R> 
R> res <- microbenchmark(tidy=suppressMessages(transformTV(rdin)),
+                       dt=transformDT(dtin))
R> res
Unit: milliseconds
 expr      min       lq     mean   median       uq      max neval
 tidy 12.54723 13.18643 15.29676 13.73418 14.71008 104.5754   100
   dt  7.66842  8.02404  8.60915  8.29984  8.72071  17.7818   100
R>

Not quite a factor of two on the small data set, but again a clear advantage. data.table has a reputation for doing really well for large datasets; here we see that it is also faster for small datasets.

Side-by-side

Stripping the reading, as well as the plotting both of which are about the same, we can compare the essential data operations.

Summary

We found a simple task solved using code and packages from an increasingly popular sub-culture within R, and contrasted it with a second approach. We find the second approach to i) have fewer dependencies, ii) less code, and iii) running faster.

Now, undoubtedly the former approach will have its staunch defenders (and that is all good and well, after all choice is good and even thirty years later some still debate vi versus emacs endlessly) but I thought it to be instructive to at least to be able to make an informed comparison.

Acknowledgements

My thanks to G. Elliot Morris for a fine example, and of course a fine blog and (if somewhat hyperactive) Twitter account.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Fri, 22 Dec 2017

#14: Finding Binary .deb Files for CRAN Packages

Welcome to the fourteenth post in the rationally rambling R rants series, or R⁴ for short. The last two posts were concerned with faster installation. First, we showed how ccache can speed up (re-)installation. This was followed by a second post on faster installation via binaries.

This last post immediately sparked some follow-up. Replying to my tweet about it, David Smith wondered how to combine binary and source installation (tl;dr: it is hard as you need to combine two package managers). Just this week, Max Ogden wondered how to install CRAN packages as binaries on Linux, and Daniel Nuest poked me on GitHub as part of his excellent containerit project as installation of binaries would of course also make Docker container builds much faster. (tl;dr: Oh yes, see below!)

So can one? Sure. We have a tool. But first the basics.

The Basics

Packages for a particular distribution are indexed by a packages file for that distribution. This is not unlike CRAN using top-level PACKAGES* files. So in principle you could just fetch those packages files, parse and index them, and then search them. In practice that is a lot of work as Debian and Ubuntu now have several tens of thousands of packages.

So it is better to use the distro tool. In my use case on .deb-based distros, this is apt-cache. Here is a quick example for the (Ubuntu 17.04) server on which I type this:

$ sudo apt-get update -qq            ## suppress stdout display
$ apt-cache search r-cran- | wc -l
419
$

So a very vanilla Ubuntu installation has "merely" 400+ binary CRAN packages. Nothing to write home about (yet) -- but read on.

cran2deb4ubuntu, or c2d4u for short

A decade ago, I was involved in two projects to turn all of CRAN into .deb binaries. We had a first ad-hoc predecessor project, and then (much better) a 'version 2' thanks to the excellent Google Summer of Code work by Charles Blundell (and mentored by me). I ran with that for a while and carried at the peak about 2500 binaries or so. And then my controlling db died, just as I visited CRAN to show it off. Very sad. Don Armstrong ran with the code and rebuilt it on better foundations and had for quite some time all of CRAN and BioC built (peaking at maybe 7k package). Then his RAID died. The surviving effort is the one by Michael Rutter who always leaned on the Lauchpad PPA system to build his packages. And those still exist and provide a core of over 10k packages (but across different Ubuntu flavours, see below).

Using cran2deb4ubuntu

In order to access c2d4u you need an Ubuntu system. For example my Travis runner script does

# Add marutter's c2d4u repository, (and rrutter for CRAN builds too)
sudo add-apt-repository -y "ppa:marutter/rrutter"
sudo add-apt-repository -y "ppa:marutter/c2d4u"

After that one can query apt-cache as above, but take advantage of a much larger pool with over 3500 packages (see below). The add-apt-repository command does the Right Thing (TM) in terms of both getting the archive key, and adding the apt source entry to the config directory.

How about from R? Sure, via RcppAPT

Now, all this command-line business is nice. But can we do all this programmatically from R? Sort of.

The RcppAPT package interface the libapt library, and provides access to a few functions. I used this feature when I argued (unsuccessfully, as it turned out) for a particular issue concerning Debian and R upgrades. But that is water under the bridge now, and the main point is that "yes we can".

In Docker: r-apt

Building on RcppAPT, within the Rocker Project we built on top of this by proving a particular class of containers for different Ubuntu releases which all contain i) RcppAPT and ii) the required apt source entry for Michael's repos.

So now we can do this

$ docker run --rm -ti rocker/r-apt:xenial /bin/bash -c 'apt-get update -qq; apt-cache search r-cran- | wc -l'
3525
$

This fires up the corresponding Docker container for the xenial (ie 16.04 LTS) release, updates the apt indices and then searches for r-cran-* packages. And it seems we have a little over 3500 packages. Not bad at all (especially once you realize that this skews strongly towards the more popular packages).

Example: An rstan container

A little while a ago a seemingly very frustrated user came to Carl and myself and claimed that out Rocker Project sucketh because building rstan was all but impossible. I don't have the time, space or inclination to go into details, but he was just plain wrong. You do need to know a little about C++, package building, and more to do this from scratch. Plus, there was a long-standing issue with rstan and newer Boost (which also included several workarounds).

Be that as it may, it serves as nice example here. So the first question: is rstan packaged?

$ docker run --rm -ti rocker/r-apt:xenial /bin/bash -c 'apt-get update -qq; apt-cache show r-cran-rstan'
Package: r-cran-rstan
Source: rstan
Priority: optional
Section: gnu-r
Installed-Size: 5110
Maintainer: cran2deb4ubuntu <cran2deb4ubuntu@gmail.com>
Architecture: amd64
Version: 2.16.2-1cran1ppa0
Depends: pandoc, r-base-core, r-cran-ggplot2, r-cran-stanheaders, r-cran-inline, r-cran-gridextra, r-cran-rcpp,\
   r-cran-rcppeigen, r-cran-bh, libc6 (>= 2.14), libgcc1 (>= 1:4.0), libstdc++6 (>= 5.2)
Filename: pool/main/r/rstan/r-cran-rstan_2.16.2-1cran1ppa0_amd64.deb
Size: 1481562
MD5sum: 60fe7cfc3e8813a822e477df24b37ccf
SHA1: 75bbab1a4193a5731ed105842725768587b4ec22
SHA256: 08816ea0e62b93511a43850c315880628419f2b817a83f92d8a28f5beb871fe2
Description: GNU R package "R Interface to Stan"
Description-md5: c9fc74a96bfde57f97f9d7c16a218fe5

$

It would seem so. With that, the following very minimal Dockerfile is all we need:

## Emacs, make this -*- mode: sh; -*-

## Start from xenial
FROM rocker/r-apt:xenial

## This handle reaches Carl and Dirk
MAINTAINER "Carl Boettiger and Dirk Eddelbuettel" rocker-maintainers@eddelbuettel.com

## Update and install rstan
RUN apt-get update && apt-get install -y --no-install-recommends r-cran-rstan

## Make R the default
CMD ["R"]

In essence, it executes one command: install rstan but from binary taking care of all dependencies. And lo and behold, it works as advertised:

$ docker run --rm -ti rocker/rstan:local Rscript -e 'library(rstan)'
Loading required package: ggplot2
Loading required package: StanHeaders
rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
$

So there: installing from binary works, takes care of dependencies, is easy and as an added bonus even faster. What's not too like?

(And yes, a few of us are working on a system to have more packages available as binaries, but it may take another moment...)

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Wed, 13 Dec 2017

#13: (Much) Faster Package (Re-)Installation via Binaries

Welcome to the thirteenth post in the ridiculously rapid R recommendation series, or R⁴ for short. A few days ago we riffed on faster installation thanks to ccache. Today we show another way to get equally drastic gains for some (if not most) packages.

In a nutshell, there are two ways to get your R packages off CRAN. Either you install as a binary, or you use source. Most people do not think too much about this as on Windows, binary is the default. So why wouldn't one? Precisely. (Unless you are on Windows, and you develop, or debug, or test, or ... and need source. Another story.) On other operating systems, however, source is the rule, and binary is often unavailable.

Or is it? Exactly how to find out what is available will be left for another post as we do have a tool just for that. But today, just hear me out when I say that binary is often an option even when source is the default. And it matters. See below.

As a (mostly-to-always) Linux user, I sometimes whistle between my teeth that we "lost all those battles" (i.e. for the desktop(s) or laptop(s)) but "won the war". That topic merits a longer post I hope to write one day, and I won't do it justice today but my main gist that everybody (and here I mean mostly developers/power users) now at least also runs on Linux. And by that I mean that we all test our code in Linux environments such as e.g. Travis CI, and that many of us run deployments on cloud instances (AWS, GCE, Azure, ...) which are predominantly based on Linux. Or on local clusters. Or, if one may dream, the top500 And on and on. And frequently these are Ubuntu machines.

So here is an Ubuntu trick: Install from binary, and save loads of time. As an illustration, consider the chart below. It carries over the logic from the 'cached vs non-cached' compilation post and contrasts two ways of installing: from source, or as a binary. I use pristine and empty Docker containers as the base, and rely of course on the official r-base image which is supplied by Carl Boettiger and yours truly as part of our Rocker Project (and for which we have a forthcoming R Journal piece I might mention). So for example the timings for the ggplot2 installation were obtained via

time docker run --rm -ti r-base  /bin/bash -c 'install.r ggplot2'

and

time docker run --rm -ti r-base  /bin/bash -c 'apt-get update && apt-get install -y r-cran-ggplot2'

Here docker run --rm -ti just means to launch Docker, in 'remove leftovers at end' mode, use terminal and interactive mode and invoke a shell. The shell command then is, respectively, to install a CRAN package using install.r from my littler package, or to install the binary via apt-get after updating the apt indices (as the Docker container may have been built a few days or more ago).

Let's not focus on Docker here---it is just a convenient means to an end of efficiently measuring via a simple (wall-clock counting) time invocation. The key really is that install.r is just a wrapper to install.packages() meaning source installation on Linux (as used inside the Docker container). And apt-get install ... is how one gets a binary. Again, I will try post another piece to determine how one finds if a suitable binary for a CRAN package exists. For now, just allow me to proceed.

So what do we see then? Well have a look:

A few things stick out. RQuantLib really is a monster. And dplyr is also fairly heavy---both rely on Rcpp, BH and lots of templating. At the other end, data.table is still a marvel. No external dependencies, and just plain C code make the source installation essentially the same speed as the binary installation. Amazing. But I digress.

We should add that one of the source installations also required installing additional libries: QuantLib is needed along with Boost for RQuantLib. Similar for another package (not shown) which needed curl and libcurl.

So what is the upshot? If you can, consider binaries. I will try to write another post how I do that e.g. for Travis CI where all my tests us binaries. (Yes, I know. This mattered more in the past when they did not cache. It still matters today as you a) do not need to fill the cache in the first place and b) do not need to worry about details concerning compilation from source which still throws enough people off. But yes, you can of course survive as is.)

The same approach is equally valid on AWS and related instances: I answered many StackOverflow questions where folks were failing to compile "large-enough" pieces from source on minimal installations with minimal RAM, and running out of resources and failed with bizarre errors. In short: Don't. Consider binaries. It saves time and trouble.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sat, 09 Dec 2017

#12: Know and Customize your OS and Work Environment

Welcome to the twelveth post in the randomly relevant R recommendations series, or R⁴ for short. This post will insert a short diversion into what was planned as a sequence of posts on faster installations that started recently with this post but we will resume to it very shortly (for various definitions of "very" or "shortly").

Earlier today Davis Vaughn posted a tweet about a blog post of his describing a (term) paper he wrote modeling bitcoin volatilty using Alexios's excellent rugarch package---and all that typeset with the styling James and I put together in our little pinp package which is indeed very suitable for such tasks of writing (R)Markdown + LaTeX + R code combinations conveniently in a single source file.

Leaving aside the need to celebreate a term paper with a blog post and tweet, pinp is indeed very nice and deserving of some additional exposure and tutorials. Now, Davis sets out to do all this inside RStudio---as folks these days seem to like to limit themselves to a single tool or paradigm. Older and wiser users prefer the flexibility of switching tools and approaches, but alas, we digress. While Davis manages of course to do all this in RStudio which is indeed rather powerful and therefore rightly beloved, he closes on

I wish there was some way to have Live Rendering like with blogdown so that I could just keep a rendered version of the paper up and have it reload every time I save. That would be the dream!

and I can only add a forceful: Fear not, young man, for we can help thou!

Modern operating systems have support for epoll and libnotify, which can be used from the shell. Just how your pdf application refreshes automagically when a pdf file is updated, we can hook into this from the shell to actually create the pdf when the (R)Markdown file is updated. I am going to use a tool readily available on my Linux systems; macOS will surely have something similar. The entr command takes one or more file names supplied on stdin and executes a command when one of them changes. Handy for invoking make whenever one of your header or source files changes, and useable here. E.g. the last markdown file I was working on was named comments.md and contained comments to a referee, and we can auto-process it on each save via

echo comments.md | entr render.r comments.md

which uses render.r from littler (new release soon too...; a simple Rscript -e 'rmarkdown::render("comments.md")' would probably work too but render.r is shorter and little more powerful so I use it more often myself) on the input file comments.md which also happens to (here sole) file being monitored.

And that is really all there is to it. I wanted / needed something like this a few months ago at work too, and may have used an inotify-based tool there but cannot find any notes. Python has something similar via watchdog which is yet again more complicated / general.

It turns out that auto-processing is actually not that helpful as we often save before an expression is complete, leading to needless error messages. So at the end of the day, I often do something much simpler. My preferred editor has a standard interface to 'building': pressing C-x c loads a command (it recalls) that defaults to make -k (i.e., make with error skipping). Simply replacing that with render.r comments.md (in this case) means we get an updated pdf file when we want with a simple customizable command / key-combination.

So in sum: it is worth customizing your environments, learning about what your OS may have, and looking beyond a single tool / editor / approach. Even dreams may come true ...

Postscriptum: And Davis takes this in a stride and almost immediately tweeted a follow-up with a nice screen capture mp4 movie showing that entr does indeed work just as well on his macbook.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Mon, 27 Nov 2017

#11: (Much) Faster Package (Re-)Installation via Caching

Welcome to the eleventh post in the rarely rued R rants series, or R⁴ for short. Time clearly flies as it has been three months since out last post on significantly reducing library size via stripping. I had been meaning to post on today's topic for quite some time, but somehow something (working on a paper, releasing a package, ...) got in the way.

Just a few days ago Colin (of Efficient R Programming fame) posted about speed(ing up) package installation. His recommendation? Remember that we (usually) have multiple cores and using several of them via options(Ncpus = XX). It is an excellent point, and it bears repeating.

But it turns I have not one but two salient recommendations too. Today covers the first, we should hopefully get pretty soon to the second. Both have one thing in common: you will be fastest if you avoid doing the work in the first place.

What?

One truly outstanding tool for this in the context of the installation of compiled packages is ccache. It is actually a pretty old tool that has been out for well over a decade, and it comes from the folks that gave the Samba filesystem.

What does it do? Well, it nutshell, it "hashes" a checksum of a source file once the preprocessor has operated on it and stores the resulting object file. In the case of rebuild with unchanged code you get the object code back pretty much immediately. The idea is very similar to memoisation (as implemented in R for example in the excellent little memoise package by Hadley, Jim, Kirill and Daniel. The idea is the same: if you have to do something even moderately expensive a few times, do it once and then recall it the other times.

This happens (at least to me) more often that not in package development. Maybe you change just one of several source files. Maybe you just change the R code, the Rd documentation or a test file---yet still need a full reinstallation. In all these cases, ccache can help tremdendously as illustrated below.

How?

Because essentially all our access to compilation happens through R, we need to set this in a file read by R. I use ~/.R/Makevars for this and have something like these lines on my machines:

VER=
CCACHE=ccache
CC=$(CCACHE) gcc$(VER)
CXX=$(CCACHE) g++$(VER)
CXX11=$(CCACHE) g++$(VER)
CXX14=$(CCACHE) g++$(VER)
FC=$(CCACHE) gfortran$(VER)
F77=$(CCACHE) gfortran$(VER)

That way, when R calls the compiler(s) it will prefix with ccache. And ccache will then speed up.

There is an additional issue due to R use. Often we install from a .tar.gz. These will be freshly unpackaged, and hence have "new" timestamps. This would usually lead ccache to skip to file (fear of "false positives") so we have to override this. Similarly, the tarball is usually unpackage in a temporary directory with an ephemeral name, creating a unique path. That too needs to be overwritten. So in my ~/.ccache/ccache.conf I have this:

max_size = 5.0G
# important for R CMD INSTALL *.tar.gz as tarballs are expanded freshly -> fresh ctime
sloppiness = include_file_ctime
# also important as the (temp.) directory name will differ
hash_dir = false

Show Me

A quick illustration will round out the post. Some packages are meatier than others. More C++ with more templates usually means longer build times. Below is a quick chart comparing times for a few such packages (ie RQuantLib, dplyr, rstan) as well as igraph ("merely" a large C package) and lme4 as well as Rcpp. The worst among theseis still my own RQuantLib package wrapping (still just parts of) the genormous and Boost-heavy QuantLib library.

Pretty dramatic gains. Best of all, we can of course combine these with other methods such as Colin's use of multiple CPUs, or even a simple MAKE=make -j4 to have multiple compilation units being considered in parallel. So maybe we all get to spend less time on social media and other timewasters as we spend less time waiting for our builds. Or maybe that is too much to hope for...

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sun, 20 Aug 2017

#10: Compacting your Shared Libraries, After The Build

Welcome to the tenth post in the rarely ranting R recommendations series, or R⁴ for short. A few days ago we showed how to tell the linker to strip shared libraries. As discussed in the post, there are two options. One can either set up ~/.R/Makevars by passing the strip-debug option to the linker. Alternatively, one can adjust src/Makevars in the package itself with a bit a Makefile magic.

Of course, there is a third way: just run strip --strip-debug over all the shared libraries after the build. As the path is standardized, and the shell does proper globbing, we can just do

$ strip --strip-debug /usr/local/lib/R/site-library/*/libs/*.so

using a double-wildcard to get all packages (in that R package directory) and all their shared libraries. Users on macOS probably want .dylib on the end, users on Windows want another computer as usual (just kidding: use .dll). Either may have to adjust the path which is left as an exercise to the reader.

The impact can be Yuge as illustrated in the following dotplot:

This illustration is in response to a mailing list post. Last week, someone claimed on r-help that tidyverse would not install on Ubuntu 17.04. And this is of course patently false as many of us build and test on Ubuntu and related Linux systems, Travis runs on it, CRAN tests them etc pp. That poor user had somehow messed up their default gcc version. Anyway: I fired up a Docker container, installed r-base-core plus three required -dev packages (for xml2, openssl, and curl) and ran a single install.packages("tidyverse"). In a nutshell, following the launch of Docker for an Ubuntu 17.04 container, it was just

$ apt-get update
$ apt-get install r-base libcurl4-openssl-dev libssl-dev libxml2-dev
$ apt-get install mg          # a tiny editor
$ mg /etc/R/Rprofile.site     # to add a default CRAN repo
$ R -e 'install.packages("tidyverse")'

which not only worked (as expected) but also installed a whopping fifty-one packages (!!) of which twenty-six contain a shared library. A useful little trick is to run du with proper options to total, summarize, and use human units which reveals that these libraries occupy seventy-eight megabytes:

root@de443801b3fc:/# du -csh /usr/local/lib/R/site-library/*/libs/*so
4.3M    /usr/local/lib/R/site-library/Rcpp/libs/Rcpp.so
2.3M    /usr/local/lib/R/site-library/bindrcpp/libs/bindrcpp.so
144K    /usr/local/lib/R/site-library/colorspace/libs/colorspace.so
204K    /usr/local/lib/R/site-library/curl/libs/curl.so
328K    /usr/local/lib/R/site-library/digest/libs/digest.so
33M     /usr/local/lib/R/site-library/dplyr/libs/dplyr.so
36K     /usr/local/lib/R/site-library/glue/libs/glue.so
3.2M    /usr/local/lib/R/site-library/haven/libs/haven.so
272K    /usr/local/lib/R/site-library/jsonlite/libs/jsonlite.so
52K     /usr/local/lib/R/site-library/lazyeval/libs/lazyeval.so
64K     /usr/local/lib/R/site-library/lubridate/libs/lubridate.so
16K     /usr/local/lib/R/site-library/mime/libs/mime.so
124K    /usr/local/lib/R/site-library/mnormt/libs/mnormt.so
372K    /usr/local/lib/R/site-library/openssl/libs/openssl.so
772K    /usr/local/lib/R/site-library/plyr/libs/plyr.so
92K     /usr/local/lib/R/site-library/purrr/libs/purrr.so
13M     /usr/local/lib/R/site-library/readr/libs/readr.so
4.7M    /usr/local/lib/R/site-library/readxl/libs/readxl.so
1.2M    /usr/local/lib/R/site-library/reshape2/libs/reshape2.so
160K    /usr/local/lib/R/site-library/rlang/libs/rlang.so
928K    /usr/local/lib/R/site-library/scales/libs/scales.so
4.9M    /usr/local/lib/R/site-library/stringi/libs/stringi.so
1.3M    /usr/local/lib/R/site-library/tibble/libs/tibble.so
2.0M    /usr/local/lib/R/site-library/tidyr/libs/tidyr.so
1.2M    /usr/local/lib/R/site-library/tidyselect/libs/tidyselect.so
4.7M    /usr/local/lib/R/site-library/xml2/libs/xml2.so
78M     total
root@de443801b3fc:/#

Looks like dplyr wins this one at thirty-three megabytes just for its shared library.

But with a single stroke of strip we can reduce all this down a lot:

root@de443801b3fc:/# strip --strip-debug /usr/local/lib/R/site-library/*/libs/*so
root@de443801b3fc:/# du -csh /usr/local/lib/R/site-library/*/libs/*so
440K    /usr/local/lib/R/site-library/Rcpp/libs/Rcpp.so
220K    /usr/local/lib/R/site-library/bindrcpp/libs/bindrcpp.so
52K     /usr/local/lib/R/site-library/colorspace/libs/colorspace.so
56K     /usr/local/lib/R/site-library/curl/libs/curl.so
120K    /usr/local/lib/R/site-library/digest/libs/digest.so
2.5M    /usr/local/lib/R/site-library/dplyr/libs/dplyr.so
16K     /usr/local/lib/R/site-library/glue/libs/glue.so
404K    /usr/local/lib/R/site-library/haven/libs/haven.so
76K     /usr/local/lib/R/site-library/jsonlite/libs/jsonlite.so
20K     /usr/local/lib/R/site-library/lazyeval/libs/lazyeval.so
24K     /usr/local/lib/R/site-library/lubridate/libs/lubridate.so
8.0K    /usr/local/lib/R/site-library/mime/libs/mime.so
52K     /usr/local/lib/R/site-library/mnormt/libs/mnormt.so
84K     /usr/local/lib/R/site-library/openssl/libs/openssl.so
76K     /usr/local/lib/R/site-library/plyr/libs/plyr.so
32K     /usr/local/lib/R/site-library/purrr/libs/purrr.so
648K    /usr/local/lib/R/site-library/readr/libs/readr.so
400K    /usr/local/lib/R/site-library/readxl/libs/readxl.so
128K    /usr/local/lib/R/site-library/reshape2/libs/reshape2.so
56K     /usr/local/lib/R/site-library/rlang/libs/rlang.so
100K    /usr/local/lib/R/site-library/scales/libs/scales.so
496K    /usr/local/lib/R/site-library/stringi/libs/stringi.so
124K    /usr/local/lib/R/site-library/tibble/libs/tibble.so
164K    /usr/local/lib/R/site-library/tidyr/libs/tidyr.so
104K    /usr/local/lib/R/site-library/tidyselect/libs/tidyselect.so
344K    /usr/local/lib/R/site-library/xml2/libs/xml2.so
6.6M    total
root@de443801b3fc:/#

Down to six point six megabytes. Not bad for one command. The chart visualizes the respective reductions. Clearly, C++ packages (and their template use) lead to more debugging symbols than plain old C code. But once stripped, the size differences are not that large.

And just to be plain, what we showed previously in post #9 does the same, only already at installation stage. The effects are not cumulative.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Mon, 14 Aug 2017

#9: Compacting your Shared Libraries

Welcome to the nineth post in the recognisably rancid R randomness series, or R⁴ for short. Following on the heels of last week's post, we aim to look into the shared libraries created by R.

We love the R build process. It is robust, cross-platform, reliable and rather predicatable. It. Just. Works.

One minor issue, though, which has come up once or twice in the past is the (in)ability to fully control all compilation options. R will always recall CFLAGS, CXXFLAGS, ... etc as used when it was compiled. Which often entails the -g flag for debugging which can seriously inflate the size of the generated object code. And once stored in ${RHOME}/etc/Makeconf we cannot on the fly override these values.

But there is always a way. Sometimes even two.

The first is local and can be used via the (personal) ~/.R/Makevars file (about which I will have to say more in another post). But something I have been using quite a bite lately uses the flags for the shared library linker. Given that we can have different code flavours and compilation choices---between C, Fortran and the different C++ standards---one can end up with a few lines. I currently use this which uses -Wl, to pass an the -S (or --strip-debug) option to the linker (and also reiterates the desire for a shared library, presumably superfluous):

SHLIB_CXXLDFLAGS = -Wl,-S -shared
SHLIB_CXX11LDFLAGS = -Wl,-S -shared
SHLIB_CXX14LDFLAGS = -Wl,-S -shared
SHLIB_FCLDFLAGS = -Wl,-S -shared
SHLIB_LDFLAGS = -Wl,-S -shared

Let's consider an example: my most recently uploaded package RProtoBuf. Built under a standard 64-bit Linux setup (Ubuntu 17.04, g++ 6.3) and not using the above, we end up with library containing 12 megabytes (!!) of object code:

edd@brad:~/git/rprotobuf(feature/fewer_warnings)$ ls -lh src/RProtoBuf.so
-rwxr-xr-x 1 edd edd 12M Aug 14 20:22 src/RProtoBuf.so
edd@brad:~/git/rprotobuf(feature/fewer_warnings)$

However, if we use the flags shown above in .R/Makevars, we end up with much less:

edd@brad:~/git/rprotobuf(feature/fewer_warnings)$ ls -lh src/RProtoBuf.so 
-rwxr-xr-x 1 edd edd 626K Aug 14 20:29 src/RProtoBuf.so
edd@brad:~/git/rprotobuf(feature/fewer_warnings)$

So we reduced the size from 12mb to 0.6mb, an 18-fold decrease. And the file tool still shows the file as 'not stripped' as it still contains the symbols. Only debugging information was removed.

What reduction in size can one expect, generally speaking? I have seen substantial reductions for C++ code, particularly when using tenmplated code. More old-fashioned C code will be less affected. It seems a little difficult to tell---but this method is my new build default as I continually find rather substantial reductions in size (as I tend to work mostly with C++-based packages).

The second option only occured to me this evening, and complements the first which is after all only applicable locally via the ~/.R/Makevars file. What if we wanted it affect each installation of a package? The following addition to its src/Makevars should do:

strippedLib: $(SHLIB)
        if test -e "/usr/bin/strip"; then /usr/bin/strip --strip-debug $(SHLIB); fi

.phony: strippedLib

We declare a new Makefile target strippedLib. But making it dependent on $(SHLIB), we ensure the standard target of this Makefile is built. And by making the target .phony we ensure it will always be executed. And it simply tests for the strip tool, and invokes it on the library after it has been built. Needless to say we get the same reduction is size. And this scheme may even pass muster with CRAN, but I have not yet tried.

Lastly, and acknowledgement. Everything in this post has benefited from discussion with my former colleague Dan Dillon who went as far as setting up tooling in his r-stripper repository. What we have here may be simpler, but it would not have happened with what Dan had put together earlier.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Thu, 10 Aug 2017

#8: Customizing Spell Checks for R CMD check

Welcome to the eight post in the ramblingly random R rants series, or R⁴ for short. We took a short break over the last few weeks due to some conferencing followed by some vacationing and general chill.

But we're back now, and this post gets us back to initial spirit of (hopefully) quick and useful posts. Perusing yesterday's batch of CRANberries posts, I noticed a peculiar new directory shown the in the diffstat output we use to compare two subsequent source tarballs. It was entitled .aspell/, in the top-level directory, and in two new packages by R Core member Kurt Hornik himself.

The context is, of course, the not infrequently-expressed desire to customize the spell checking done on CRAN incoming packages, see e.g. this r-package-devel thread.

And now we can as I verified with (the upcoming next release of) RcppArmadillo, along with a recent-enough (i.e. last few days) version of r-devel. Just copying what Kurt did, i.e. adding a file .aspell/defaults.R, and in it pointing to rds file (named as the package) containing a character vector with words added to the spell checker's universe is all it takes. For my package, see here for the peculiars.

Or see here:

edd@bud:~/git/rcpparmadillo/.aspell(master)$ cat defaults.R 
Rd_files <- vignettes <- R_files <- description <-
    list(encoding = "UTF-8",
         language = "en",
         dictionaries = c("en_stats", "RcppArmadillo"))
edd@bud:~/git/rcpparmadillo/.aspell(master)$ r -p -e 'readRDS("RcppArmadillo.rds")'
[1] "MPL"            "Sanderson"      "Templated"
[4] "decompositions" "onwards"        "templated"
edd@bud:~/git/rcpparmadillo/.aspell(master)$

And now R(-devel) CMD check --as-cran ... is silent about spelling. Yay!

But take this with a grain of salt as this does not yet seem to be "announced" as e.g. yesterday's change in the CRAN Policy did not mention it. So things may well change -- but hey, it worked for me.

And this all is about aspell, here is something topical about a spell to close the post:

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Tue, 13 Jun 2017

#7: C++14, R and Travis -- A useful hack

Welcome to the seventh post in the rarely relevant R ramblings series, or R⁴ for short.

We took a short break as several conferences and other events interfered during the month of May, keeping us busy and away from this series. But we are back now with a short and useful hack I came up with this weekend.

The topic is C++14, i.e. the newest formally approved language standard for C++, and its support in R and on Travis CI. With release R 3.4.0 of a few weeks ago, R now formally supports C++14. Which is great.

But there be devils. A little known fact is that R hangs on to its configuration settings from its own compile time. That matters in cases such as the one we are looking at here: Travis CI. Travis is a tremendously useful and widely-deployed service, most commonly connected to GitHub driving "continuous integration" (the 'CI') testing after each commit. But Travis CI, for as useful as it is, is also maddingly conservative still forcing everybody to live and die by [Ubuntu 14.04]http://releases.ubuntu.com/14.04/). So while we all benefit from the fine work by Michael who faithfully provides Ubuntu binaries for distribution via CRAN (based on the Debian builds provided by yours truly), we are stuck with Ubuntu 14.04. Which means that while Michael can provide us with current R 3.4.0 it will be built on ancient Ubuntu 14.04.

Why does this matter, you ask? Well, if you just try to turn the very C++14 support added to R 3.4.0 on in the binary running on Travis, you get this error:

** libs
Error in .shlib_internal(args) : 
  C++14 standard requested but CXX14 is not defined

And you get it whether or not you define CXX14 in the session.

So R (in version 3.4.0) may want to use C++14 (because a package we submitted requests it), but having been built on the dreaded Ubuntu 14.04, it just can't oblige. Even when we supply a newer compiler. Because R hangs on to its compile-time settings rather than current environment variables. And that means no C++14 as its compile-time compiler was too ancient. Trust me, I tried: adding not only g++-6 (from a suitable repo) but also adding C++14 as the value for CXX_STD. Alas, no mas.

The trick to overcome this is twofold, and fairly straightforward. First off, we just rely on the fact that g++ version 6 defaults to C++14. So by supplying g++-6, we are in the green. We have C++14 by default without requiring extra options. Sweet.

The remainder is to tell R to not try to enable C++14 even though we are using it. How? By removing CXX_STD=C++14 on the fly and just for Travis. And this can be done easily with a small script configure which conditions on being on Travis by checking two environment variables:

#!/bin/bash

## Travis can let us run R 3.4.0 (from CRAN and the PPAs) but this R version
## does not know about C++14.  Even though we can select CXX_STD = C++14, R
## will fail as the version we use there was built in too old an environment,
## namely Ubuntu "trusty" 14.04.
##
## So we install g++-6 from another repo and rely on the fact that is
## defaults to C++14.  Sadly, we need R to not fail and hence, just on
## Travis, remove the C++14 instruction

if [[ "${CI}" == "true" ]]; then
    if [[ "${TRAVIS}" == "true" ]]; then 
        echo "** Overriding src/Makevars and removing C++14 on Travis only"
        sed -i 's|CXX_STD = CXX14||' src/Makevars
    fi
fi

I have deployed this now for two sets of builds in two distinct repositories for two "under-development" packages not yet on CRAN, and it just works. In case you turn on C++14 via SystemRequirements: in the file DESCRIPTION, you need to modify it here.

So to sum up, there it is: C++14 with R 3.4.0 on Travis. Only takes a quick Travis-only modification.

/code/r4 | permanent link

Sun, 30 Apr 2017

#6: Easiest package registration

Welcome to the sixth post in the really random R riffs series, or R⁴ for short.

Posts #1 and #2 discussed how to get the now de rigeur package registration information computed. In essence, we pointed to something which R 3.4.0 would have, and provided tricks for accessing it while R 3.3.3 was still R-released.

But now R 3.4.0 is out, and life is good! Or at least this is easier. For example, a few days ago I committed this short helper script pnrrs.r to littler:

#!/usr/bin/r

if (getRversion() < "3.4.0") stop("Not available for R (< 3.4.0). Please upgrade.", call.=FALSE)

tools::package_native_routine_registration_skeleton(".")

So with this example script pnrrs.r soft-linked to /usr/local/bin (or ~/bin) as I commonly do with littler helpers, all it takes is

cd some/R/package/source
pnrrs.r

and the desired file usable as src/init.c is on stdout. Editing NAMESPACE is quick too, and we're all done. See the other two posts for additional context. If you don't have littler, the above also works with Rscript.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Fri, 14 Apr 2017

#5: Easy package information

Welcome to the fifth post in the recklessly rambling R rants series, or R⁴ for short.

The third post showed an easy way to follow R development by monitoring (curated) changes on the NEWS file for the development version r-devel. As a concrete example, I mentioned that it has shown a nice new function (tools::CRAN_package_db()) coming up in R 3.4.0. Today we will build on that.

Consider the following short snippet:

library(data.table)

getPkgInfo <- function() {
    if (exists("tools::CRAN_package_db")) {
        dat <- tools::CRAN_package_db()
    } else {
        tf <- tempfile()
        download.file("https://cloud.r-project.org/src/contrib/PACKAGES.rds", tf, quiet=TRUE)
        dat <- readRDS(tf)              # r-devel can now readRDS off a URL too
    }
    dat <- as.data.frame(dat)
    setDT(dat)
    dat
}

It defines a simple function getPkgInfo() as a wrapper around said new function from R 3.4.0, ie tools::CRAN_package_db(), and a fallback alternative using a tempfile (in the automagically cleaned R temp directory) and an explicit download and read of the underlying RDS file. As an aside, just this week the r-devel NEWS told us that such readRDS() operations can now read directly from URL connection. Very nice---as RDS is a fantastic file format when you are working in R.

Anyway, back to the RDS file! The snippet above returns a data.table object with as many rows as there are packages on CRAN, and basically all their (parsed !!) DESCRIPTION info and then some. A gold mine!

Consider this to see how many package have a dependency (in the sense of Depends, Imports or LinkingTo, but not Suggests because Suggests != Depends) on Rcpp:

R> dat <- getPkgInfo()
R> rcppRevDepInd <- as.integer(tools::dependsOnPkgs("Rcpp", recursive=FALSE, installed=dat))
R> length(rcppRevDepInd)
[1] 998
R>

So exciting---we will hit 1000 within days! But let's do some more analysis:

R> dat[ rcppRevDepInd, RcppRevDep := TRUE]  # set to TRUE for given set
R> dat[ RcppRevDep==TRUE, 1:2]
           Package Version
  1:      ABCoptim  0.14.0
  2: AbsFilterGSEA     1.5
  3:           acc   1.3.3
  4: accelerometry   2.2.5
  5:      acebayes   1.3.4
 ---                      
994:        yakmoR   0.1.1
995:  yCrypticRNAs  0.99.2
996:         yuima   1.5.9
997:           zic     0.9
998:       ziphsmm   1.0.4
R>

Here we index the reverse dependency using the vector we had just computed, and then that new variable to subset the data.table object. Given the aforementioned parsed information from all the DESCRIPTION files, we can learn more:

R> ## likely false entries
R> dat[ RcppRevDep==TRUE, ][NeedsCompilation!="yes", c(1:2,4)]
            Package Version                                                                         Depends
 1:         baitmet   1.0.0                                                           Rcpp, erah (>= 1.0.5)
 2:           bea.R   1.0.1                                                        R (>= 3.2.1), data.table
 3:            brms   1.6.0                     R (>= 3.2.0), Rcpp (>= 0.12.0), ggplot2 (>= 2.0.0), methods
 4: classifierplots   1.3.3                             R (>= 3.1), ggplot2 (>= 2.2), data.table (>= 1.10),
 5:           ctsem   2.3.1                                           R (>= 3.2.0), OpenMx (>= 2.3.0), Rcpp
 6:        DeLorean   1.2.4                                                  R (>= 3.0.2), Rcpp (>= 0.12.0)
 7:            erah   1.0.5                                                               R (>= 2.10), Rcpp
 8:             GxM     1.1                                                                              NA
 9:             hmi   0.6.3                                                                    R (>= 3.0.0)
10:        humarray     1.1 R (>= 3.2), NCmisc (>= 1.1.4), IRanges (>= 1.22.10),\nGenomicRanges (>= 1.16.4)
11:         iNextPD   0.3.2                                                                    R (>= 3.1.2)
12:          joinXL   1.0.1                                                                    R (>= 3.3.1)
13:            mafs   0.0.2                                                                              NA
14:            mlxR   3.1.0                                                           R (>= 3.0.1), ggplot2
15:    RmixmodCombi     1.0              R(>= 3.0.2), Rmixmod(>= 2.0.1), Rcpp(>= 0.8.0), methods,\ngraphics
16:             rrr   1.0.0                                                                    R (>= 3.2.0)
17:        UncerIn2     2.0                          R (>= 3.0.0), sp, RandomFields, automap, fields, gstat
R>

There are a full seventeen packages which claim to depend on Rcpp while not having any compiled code of their own. That is likely false---but I keep them in my counts, however relunctantly. A CRAN-declared Depends: is a Depends:, after all.

Another nice thing to look at is the total number of package that declare that they need compilation:

R> ## number of packages with compiled code
R> dat[ , .(N=.N), by=NeedsCompilation]
   NeedsCompilation    N
1:               no 7625
2:              yes 2832
3:               No    1
R>

Isn't that awesome? It is 2832 out of (currently) 10458, or about 27.1%. Just over one in four. Now the 998 for Rcpp look even better as they are about 35% of all such packages. In order words, a little over one third of all packages with compiled code (which may be legacy C, Fortran or C++) use Rcpp. Wow.

Before closing, one shoutout to Dirk Schumacher whose thankr which I made the center of the last post is now on CRAN. As a mighty fine and slim micropackage without external dependencies. Neat.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sat, 08 Apr 2017

#4: Simpler shoulders()

Welcome to the fourth post in the repulsively random R ramblings series, or R⁴ for short.

My twitter feed was buzzing about a nice (and as yet unpublished, ie not-on-CRAN) package thankr by Dirk Schumacher which compiles a list of packages (ordered by maintainer count) for your current session (or installation or ...) with a view towards saying thank you to those whose packages we rely upon. Very nice indeed.

I had a quick look and run it twice ... and had a reaction of ewwww, really? as running it twice gave different results as on the second instance a boatload of tibblyverse packages appeared. Because apparently kids these day can only slice data that has been tidied or something.

So I had another quick look ... and put together an alternative version using just base R (as there was only one subfunction that needed reworking):

source(file="https://raw.githubusercontent.com/dirkschumacher/thankr/master/R/shoulders.R")
format_pkg_df <- function(df) { # non-tibblyverse variant
    tb <- table(df[,2])
    od <- order(tb, decreasing=TRUE)
    ndf <- data.frame(maint=names(tb)[od], npkgs=as.integer(tb[od]))
    colpkgs <- function(m, df) { paste(df[ df$maintainer == m, "pkg_name"], collapse=",") }
    ndf[, "pkg"] <- sapply(ndf$maint, colpkgs, df)
    ndf
}

A nice side benefit is that the function is now free of external dependencies (besides, of course, base R). Running this in the ESS session I had open gives:

R> shoulders()  ## by Dirk Schumacher, with small modifications
                               maint npkgs                                                                 pkg
1 R Core Team <R-core@r-project.org>     9 compiler,graphics,tools,utils,grDevices,stats,datasets,methods,base
2 Dirk Eddelbuettel <edd@debian.org>     4                                  RcppTOML,Rcpp,RApiDatetime,anytime
3  Matt Dowle <mattjdowle@gmail.com>     1                                                          data.table
R>

and for good measure a screenshot is below:

I think we need a catchy moniker for R work using good old base R. SoberVerse? GrumbyOldFolksR? PlainOldR? Better suggestions welcome.

Edit on 2017-04-09: And by now Dirk Schumacher fixed that little bug in thankr which was at the start of this. His shoulders() function is now free of side effects, and thankr is now a clean micropackage free of external depends from any verse, be it tiddly or grumpy. I look forward to seeing it on CRAN soon!

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Thu, 06 Apr 2017

#3: Follow R-devel

Welcome to the third post in the rarely relevant R recommendation series, or R⁴ for short.

Today will be brief, but of some importance. In order to know where R is going next, few places provide a better vantage point than the actual ongoing development.

A few years ago, I mentioned to Duncan Murdoch how straightforward the setup of my CRANberries feed (and site) was. After all, static blog compilers converting textual input to html, rss feed and whatnot have been around for fifteen years (though they keep getting reinvented). He took this to heart and built the (not too pretty) R-devel daily site (which also uses a fancy diff tool as it shows changes in NEWS) as well as a more general description of all available sub-feeds. I follow this mostly through blog aggregations -- Google Reader in its day, now Feedly. A screenshot is below just to show that it doesn't have to be ugly just because it is on them intertubes:

This shows a particularly useful day when R-devel folded into the new branch for what will be the R 3.4.0 release come April 21. The list of upcoming changes is truly impressive and quite comprehensive -- and the package registration helper, focus of posts #1 and #2 here, is but one of these many changes.

One function I learned about that day is tools::CRAN_package_db(), a helper to get a single (large) data.frame with all package DESCRIPTION information. Very handy. Others may have noticed that CRAN repos now have a new top-level file PACKAGES.rds and this function does indeed just fetch it--which you could do with a similar one-liner in R-release as well. Still very handy.

But do read about changes in R-devel and hence upcoming changes in R 3.4.0. Lots of good things coming our way.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Thu, 30 Mar 2017

#2: Even Easier Package Registration

Welcome to the second post in rambling random R recommendation series, or R⁴ for short.

Two days ago I posted the initial (actual) post. It provided context for why we need package registration entries (tl;dr: because R CMD check now tests for it, and because it The Right Thing to do, see documentation in the posts). I also showed how generating such a file src/init.c was essentially free as all it took was single call to a new helper function added to R-devel by Brian Ripley and Kurt Hornik.

Now, to actually use R-devel you obviously need to have it accessible. There are a myriad of ways to achieve that: just compile it locally as I have done for years, use a Docker image as I showed in the post -- or be creative with eg Travis or win-builder both of which give you access to R-devel if you're clever about it.

But as no good deed goes unpunished, I was of course told off today for showing a Docker example as Docker was not "Easy". I think the formal answer to that is baloney. But we leave that aside, and promise to discuss setting up Docker at another time.

R is after all ... just R. So below please find a script you can save as, say, ~/bin/pnrrs.r. And calling it---even with R-release---will generate the same code snippet as I showed via Docker. Call it a one-off backport of the new helper function -- with a half-life of a few weeks at best as we will have R 3.4.0 as default in just a few weeks. The script will then reduce to just the final line as the code will be present with R 3.4.0.

#!/usr/bin/r

library(tools)

.find_calls_in_package_code <- tools:::.find_calls_in_package_code
.read_description <- tools:::.read_description

## all what follows is from R-devel aka R 3.4.0 to be

package_ff_call_db <- function(dir) {
    ## A few packages such as CDM use base::.Call
    ff_call_names <- c(".C", ".Call", ".Fortran", ".External",
                       "base::.C", "base::.Call",
                       "base::.Fortran", "base::.External")

    predicate <- function(e) {
        (length(e) > 1L) &&
            !is.na(match(deparse(e[[1L]]), ff_call_names))
    }

    calls <- .find_calls_in_package_code(dir,
                                         predicate = predicate,
                                         recursive = TRUE)
    calls <- unlist(Filter(length, calls))

    if(!length(calls)) return(NULL)

    attr(calls, "dir") <- dir
    calls
}

native_routine_registration_db_from_ff_call_db <- function(calls, dir = NULL, character_only = TRUE) {
    if(!length(calls)) return(NULL)

    ff_call_names <- c(".C", ".Call", ".Fortran", ".External")
    ff_call_args <- lapply(ff_call_names,
                           function(e) args(get(e, baseenv())))
    names(ff_call_args) <- ff_call_names
    ff_call_args_names <-
        lapply(lapply(ff_call_args,
                      function(e) names(formals(e))), setdiff,
               "...")

    if(is.null(dir))
        dir <- attr(calls, "dir")

    package <- # drop name
        as.vector(.read_description(file.path(dir, "DESCRIPTION"))["Package"])

    symbols <- character()
    nrdb <-
        lapply(calls,
               function(e) {
                   if (startsWith(deparse(e[[1L]]), "base::"))
                       e[[1L]] <- e[[1L]][3L]
                   ## First figure out whether ff calls had '...'.
                   pos <- which(unlist(Map(identical,
                                           lapply(e, as.character),
                                           "...")))
                   ## Then match the call with '...' dropped.
                   ## Note that only .NAME could be given by name or
                   ## positionally (the other ff interface named
                   ## arguments come after '...').
                   if(length(pos)) e <- e[-pos]
                   ## drop calls with only ...
                   if(length(e) < 2L) return(NULL)
                   cname <- as.character(e[[1L]])
                   ## The help says
                   ##
                   ## '.NAME' is always matched to the first argument
                   ## supplied (which should not be named).
                   ##
                   ## But some people do (Geneland ...).
                   nm <- names(e); nm[2L] <- ""; names(e) <- nm
                   e <- match.call(ff_call_args[[cname]], e)
                   ## Only keep ff calls where .NAME is character
                   ## or (optionally) a name.
                   s <- e[[".NAME"]]
                   if(is.name(s)) {
                       s <- deparse(s)[1L]
                       if(character_only) {
                           symbols <<- c(symbols, s)
                           return(NULL)
                       }
                   } else if(is.character(s)) {
                       s <- s[1L]
                   } else { ## expressions
                       symbols <<- c(symbols, deparse(s))
                       return(NULL)
                   }
                   ## Drop the ones where PACKAGE gives a different
                   ## package. Ignore those which are not char strings.
                   if(!is.null(p <- e[["PACKAGE"]]) &&
                      is.character(p) && !identical(p, package))
                       return(NULL)
                   n <- if(length(pos)) {
                            ## Cannot determine the number of args: use
                            ## -1 which might be ok for .External().
                            -1L
                        } else {
                            sum(is.na(match(names(e),
                                            ff_call_args_names[[cname]]))) - 1L
                        }
                   ## Could perhaps also record whether 's' was a symbol
                   ## or a character string ...
                   cbind(cname, s, n)
               })
    nrdb <- do.call(rbind, nrdb)
    nrdb <- as.data.frame(unique(nrdb), stringsAsFactors = FALSE)

    if(NROW(nrdb) == 0L || length(nrdb) != 3L)
        stop("no native symbols were extracted")
    nrdb[, 3L] <- as.numeric(nrdb[, 3L])
    nrdb <- nrdb[order(nrdb[, 1L], nrdb[, 2L], nrdb[, 3L]), ]
    nms <- nrdb[, "s"]
    dups <- unique(nms[duplicated(nms)])

    ## Now get the namespace info for the package.
    info <- parseNamespaceFile(basename(dir), dirname(dir))
    ## Could have ff calls with symbols imported from other packages:
    ## try dropping these eventually.
    imports <- info$imports
    imports <- imports[lengths(imports) == 2L]
    imports <- unlist(lapply(imports, `[[`, 2L))

    info <- info$nativeRoutines[[package]]
    ## Adjust native routine names for explicit remapping or
    ## namespace .fixes.
    if(length(symnames <- info$symbolNames)) {
        ind <- match(nrdb[, 2L], names(symnames), nomatch = 0L)
        nrdb[ind > 0L, 2L] <- symnames[ind]
    } else if(!character_only &&
              any((fixes <- info$registrationFixes) != "")) {
        ## There are packages which have not used the fixes, e.g. utf8latex
        ## fixes[1L] is a prefix, fixes[2L] is an undocumented suffix
        nrdb[, 2L] <- sub(paste0("^", fixes[1L]), "", nrdb[, 2L])
        if(nzchar(fixes[2L]))
            nrdb[, 2L] <- sub(paste0(fixes[2L]), "$", "", nrdb[, 2L])
    }
    ## See above.
    if(any(ind <- !is.na(match(nrdb[, 2L], imports))))
        nrdb <- nrdb[!ind, , drop = FALSE]

    ## Fortran entry points are mapped to l/case
    dotF <- nrdb$cname == ".Fortran"
    nrdb[dotF, "s"] <- tolower(nrdb[dotF, "s"])

    attr(nrdb, "package") <- package
    attr(nrdb, "duplicates") <- dups
    attr(nrdb, "symbols") <- unique(symbols)
    nrdb
}

format_native_routine_registration_db_for_skeleton <- function(nrdb, align = TRUE, include_declarations = FALSE) {
    if(!length(nrdb))
        return(character())

    fmt1 <- function(x, n) {
        c(if(align) {
              paste(format(sprintf("    {\"%s\",", x[, 1L])),
                    format(sprintf(if(n == "Fortran")
                                       "(DL_FUNC) &F77_NAME(%s),"
                                   else
                                       "(DL_FUNC) &%s,",
                                   x[, 1L])),
                    format(sprintf("%d},", x[, 2L]),
                           justify = "right"))
          } else {
              sprintf(if(n == "Fortran")
                          "    {\"%s\", (DL_FUNC) &F77_NAME(%s), %d},"
                      else
                          "    {\"%s\", (DL_FUNC) &%s, %d},",
                      x[, 1L],
                      x[, 1L],
                      x[, 2L])
          },
          "    {NULL, NULL, 0}")
    }

    package <- attr(nrdb, "package")
    dups <- attr(nrdb, "duplicates")
    symbols <- attr(nrdb, "symbols")

    nrdb <- split(nrdb[, -1L, drop = FALSE],
                  factor(nrdb[, 1L],
                         levels =
                             c(".C", ".Call", ".Fortran", ".External")))

    has <- vapply(nrdb, NROW, 0L) > 0L
    nms <- names(nrdb)
    entries <- substring(nms, 2L)
    blocks <- Map(function(x, n) {
                      c(sprintf("static const R_%sMethodDef %sEntries[] = {",
                                n, n),
                        fmt1(x, n),
                        "};",
                        "")
                  },
                  nrdb[has],
                  entries[has])

    decls <- c(
        "/* FIXME: ",
        "   Add declarations for the native routines registered below.",
        "*/")

    if(include_declarations) {
        decls <- c(
            "/* FIXME: ",
            "   Check these declarations against the C/Fortran source code.",
            "*/",
            if(NROW(y <- nrdb$.C)) {
                 args <- sapply(y$n, function(n) if(n >= 0)
                                paste(rep("void *", n), collapse=", ")
                                else "/* FIXME */")
                c("", "/* .C calls */",
                  paste0("extern void ", y$s, "(", args, ");"))
           },
            if(NROW(y <- nrdb$.Call)) {
                args <- sapply(y$n, function(n) if(n >= 0)
                               paste(rep("SEXP", n), collapse=", ")
                               else "/* FIXME */")
               c("", "/* .Call calls */",
                  paste0("extern SEXP ", y$s, "(", args, ");"))
            },
            if(NROW(y <- nrdb$.Fortran)) {
                 args <- sapply(y$n, function(n) if(n >= 0)
                                paste(rep("void *", n), collapse=", ")
                                else "/* FIXME */")
                c("", "/* .Fortran calls */",
                  paste0("extern void F77_NAME(", y$s, ")(", args, ");"))
            },
            if(NROW(y <- nrdb$.External))
                c("", "/* .External calls */",
                  paste0("extern SEXP ", y$s, "(SEXP);"))
            )
    }

    headers <- if(NROW(nrdb$.Call) || NROW(nrdb$.External))
        c("#include <R.h>", "#include <Rinternals.h>")
    else if(NROW(nrdb$.Fortran)) "#include <R_ext/RS.h>"
    else character()

    c(headers,
      "#include <stdlib.h> // for NULL",
      "#include <R_ext/Rdynload.h>",
      "",
      if(length(symbols)) {
          c("/*",
            "  The following symbols/expresssions for .NAME have been omitted",
            "", strwrap(symbols, indent = 4, exdent = 4), "",
            "  Most likely possible values need to be added below.",
            "*/", "")
      },
      if(length(dups)) {
          c("/*",
            "  The following name(s) appear with different usages",
            "  e.g., with different numbers of arguments:",
            "", strwrap(dups, indent = 4, exdent = 4), "",
            "  This needs to be resolved in the tables and any declarations.",
            "*/", "")
      },
      decls,
      "",
      unlist(blocks, use.names = FALSE),
      ## We cannot use names with '.' in: WRE mentions replacing with "_"
      sprintf("void R_init_%s(DllInfo *dll)",
              gsub(".", "_", package, fixed = TRUE)),
      "{",
      sprintf("    R_registerRoutines(dll, %s);",
              paste0(ifelse(has,
                            paste0(entries, "Entries"),
                            "NULL"),
                     collapse = ", ")),
      "    R_useDynamicSymbols(dll, FALSE);",
      "}")
}

package_native_routine_registration_db <- function(dir, character_only = TRUE) {
    calls <- package_ff_call_db(dir)
    native_routine_registration_db_from_ff_call_db(calls, dir, character_only)
}

package_native_routine_registration_db <- function(dir, character_only = TRUE) {
    calls <- package_ff_call_db(dir)
    native_routine_registration_db_from_ff_call_db(calls, dir, character_only)
}

package_native_routine_registration_skeleton <- function(dir, con = stdout(), align = TRUE,
                                                         character_only = TRUE, include_declarations = TRUE) {
    nrdb <- package_native_routine_registration_db(dir, character_only)
    writeLines(format_native_routine_registration_db_for_skeleton(nrdb,
                align, include_declarations),
               con)
}

package_native_routine_registration_skeleton(".")  ## when R 3.4.0 is out you only need this line

Here I use /usr/bin/r as I happen to like littler a lot, but you can use Rscript the same way.

Easy enough now?

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Wed, 29 Mar 2017

#1: Easy Package Registration

Welcome to the first actual post in the R⁴ series, following the short announcement earlier this week.

Context

Last month, Brian Ripley announced on r-devel that registration of routines would now be tested for by R CMD check in r-devel (which by next month will become R 3.4.0). A NOTE will be issued now, this will presumably turn into a WARNING at some point. Writing R Extensions has an updated introduction) of the topic.

Package registration has long been available, and applies to all native (i.e. "compiled") function via the .C(), .Call(), .Fortran() or .External() interfaces. If you use any of those -- and .Call() may be the only truly relevant one here -- then is of interest to you.

Brian Ripley and Kurt Hornik also added a new helper function: tools::package_native_routine_registration_skeleton(). It parses the R code of your package and collects all native function entrypoints in order to autogenerate the registration. It is available in R-devel now, will be in R 3.4.0 and makes adding such registration truly trivial.

But as of today, it requires that you have R-devel. Once R 3.4.0 is out, you can call the helper directly.

As for R-devel, there have always been at least two ways to use it: build it locally (which we may cover in another R⁴ installment), or using Docker. Here will focus on the latter by relying on the Rocker project by Carl and myself.

Use the Rocker drd Image

We assume you can run Docker on your system. How to add it on Windows, macOS or Linux is beyond our scope here today, but also covered extensively elsewhere. So we assume you can execute docker and e.g. bring in the 'daily r-devel' image drd from our Rocker project via

~$ docker pull rocker/drd

With that, we can use R-devel to create the registration file very easily in a single call (which is a long command-line we have broken up with one line-break for the display below).

The following is real-life example when I needed to add registration to the RcppTOML package for this week's update:

~/git/rcpptoml(master)$ docker run --rm -ti -v $(pwd):/mnt rocker/drd \  ## line break
             RD --slave -e 'tools::package_native_routine_registration_skeleton("/mnt")'
#include <R.h>
#include <Rinternals.h>
#include <stdlib.h> // for NULL
#include <R_ext/Rdynload.h>

/* FIXME: 
   Check these declarations against the C/Fortran source code.
*/

/* .Call calls */
extern SEXP RcppTOML_tomlparseImpl(SEXP, SEXP, SEXP);

static const R_CallMethodDef CallEntries[] = {
    {"RcppTOML_tomlparseImpl", (DL_FUNC) &RcppTOML_tomlparseImpl, 3},
    {NULL, NULL, 0}
};

void R_init_RcppTOML(DllInfo *dll)
{
    R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
    R_useDynamicSymbols(dll, FALSE);
}
edd@max:~/git/rcpptoml(master)$

Decompose the Command

We can understand the docker command invocation above through its components:

docker run is the basic call to a container
--rm -ti does subsequent cleanup (--rm) and gives a terminal (-t) that is interactive (-i)
-v $(pwd):/mnt uses the -v a:b options to make local directory a available as b in the container; here $(pwd) calls print working directory to get the local directory which is now mapped to /mnt in the container
rocker/drd invokes the 'drd' container of the Rocker project
RD is a shorthand to the R-devel binary inside the container, and the main reason we use this container
-e 'tools::package_native_routine_registration_skeleton("/mnt") calls the helper function of R (currently in R-devel only, so we use RD) to compute a possible init.c file based on the current directory -- which is /mnt inside the container

That it. We get a call to the R function executed inside the Docker container, examining the package in the working directory and creating a registration file for it which is display to the console.

Copy the Output to `src/init.c`

We simply copy the output to a file src/init.c; I often fold one opening brace one line up.

Update NAMESPACE

We also change one line in NAMESPACE from (for this package) useDynLib("RcppTOML") to useDynLib("RcppTOML", .registration=TRUE). Adjust accordingly for other package names.

That's it!

And with that we a have a package which no longer provokes the NOTE as seen by the checks page. Calls to native routines are now safer (less of a chance for name clashing), get called quicker as we skip the symbol search (see the WRE discussion) and best of all this applies to all native routines whether written by hand or written via a generator such as Rcpp Attributes.

So give this a try to get your package up-to-date.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Mon, 27 Mar 2017

#0: Introducing R^4

So I had been toying with the idea of getting back to the blog and more regularly writing / posting little tips and tricks. I even started taking some notes but because perfect is always the enemy of the good it never quite materialized.

But the relatively broad discussion spawned by last week's short rant on Suggests != Depends made a few things clear. There appears to be an audience. It doesn't have to be long. And it doesn't have to be too polished.

So with that, let's get the blogging back from micro-blogging.

This note forms post zero of what will a new segment I call R⁴ which is shorthand for relatively random R rambling.

Stay tuned.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Thu, 17 Aug 2023

Tue, 15 Aug 2023

Try it out

Extend r2u with r-universe

Local DevContainer build

Use in Your Repo

Acknowledgments

Colophon

Sun, 23 Jul 2023

Sat, 08 Jul 2023

Mon, 30 Jan 2023

Sun, 19 Jun 2022

Sat, 21 May 2022

Usage

Demos

Installing the full tidyverse in one command and 18 seconds

Installing brms and its depends in one command and 13 seconds (and show gitpod.io)

Integration via bspm

History and Motivation

Details

Acknowledgements

Wed, 23 Feb 2022

Mon, 31 Jan 2022

Wed, 08 Dec 2021

Wed, 09 Jun 2021

Thu, 07 Jan 2021

Thu, 03 Dec 2020

Sat, 26 Sep 2020

Wed, 26 Aug 2020

Background

Summary

Fri, 03 Jul 2020

Mon, 22 Jun 2020

Sun, 26 Apr 2020

Sun, 12 Apr 2020

Sat, 11 Apr 2020

Mon, 05 Aug 2019

Debugging with Docker: Getting Hold of Particular Compilers

Step 0: Install Docker and Test

Step 1: Download r-base and test OS

Step 2: Setup the working directory

Step 3: Install gcc-9 and gfortran-9

Step 4: Prepare Package

Step 5: Set Compiler Flags

Step 6: Install the Source Package

Step 7: Debug!

Sun, 09 Jun 2019

Wed, 27 Mar 2019

Thu, 14 Mar 2019

Sun, 24 Jun 2018

Sun, 15 Apr 2018

MKL for .deb-based systems: An easy recipe

First Step: Set up apt

Install MKL

Integrate MKL

Use the MKL

Benchmarks

Removal, if needed

Summary

Wed, 28 Feb 2018

Tue, 06 Feb 2018

Sun, 21 Jan 2018

Complete Code using Approach "TV"

Complete Code using Approach "DT"

Benchmark Reading

Benchmark Processing

Side-by-side

Summary

Acknowledgements

Fri, 22 Dec 2017

The Basics

cran2deb4ubuntu, or c2d4u for short

Using cran2deb4ubuntu

How about from R? Sure, via RcppAPT

In Docker: r-apt

Example: An rstan container

Wed, 13 Dec 2017

Sat, 09 Dec 2017

Mon, 27 Nov 2017

What?

Integration via `bspm`

Step 1: Download `r-base` and test OS

Copy the Output to `src/init.c`