Wed, 16 Jul 2014

Introducing RcppParallel: Getting R and C++ to work (some more) in parallel

A common theme over the last few decades was that we could afford to simply sit back and let computer (hardware) engineers take care of increases in computing speed thanks to Moore's law. That same line of thought now frequently points out that we are getting closer and closer to the physical limits of what Moore's law can do for us.

So the new best hope is (and has been) parallel processing. Even our smartphones have multiple cores, and most if not all retail PCs now possess two, four or more cores. Real computers, aka somewhat decent servers, can be had with 24, 32 or more cores as well, and all that is before we even consider GPU coprocessors or other upcoming changes.

And sometimes our tasks are embarassingly simple as is the case with many data-parallel jobs: we can use higher-level operations such as those offered by the base R package parallel to spawn multiple processing tasks and gather the results. I covered all this in some detail in previous talks on High Performance Computing with R (and you can also consult the Task View on High Performance Computing with R which I edit).

But sometimes we can't use data-parallel approaches. Hence we have to redo our algorithms. Which is really hard. R itself has been relying on the (fairly mature) OpenMP standard for some of its operations. Luke Tierney's (awesome) keynote in May at our (sixth) R/Finance conference mentioned some of the issues related to OpenMP. One which matters is that OpenMP works really well on Linux, and either not so well (Windows) or not at all (OS X, due the usual issue with the gcc/clang switch enforced by Applem but the good news is that the OpenMP toolchain is expected to make it to OS X is some more performant form "soon"). R is still expected to make wider use of OpenMP in future versions.

Another tool which has been around for a few years, and which can be considered to be equally mature is the Intel Threaded Building Blocks library, or TBB. JJ recently started to wrap this up for use by R. The first approach resulted in a (now superseded, see below) package TBB. But hardware and OS issues bite once again, as the Intel TBB is not really building that well for the Windows toolchain used by R (and based on MinGW).

(And yes, there are two more options. But Boost Threads requires linking which precludes easy use as e.g. via our BH package. And C++11 with its threads library (based on Boost Threads) is not yet as widely available as R and Rcpp which means that it is not a real deployment option yet.)

Now, JJ, being as awesome as he is, went back to the drawing board and integrated a second threading toolkit: TinyThread++, a small header-only library without further dependencies. Not as feature-rich as Intel Threaded Building Blocks, but at least available everywhere. So a new package RcppParallel, so far only on GitHub, wraps around both TinyThread++ and Intel Threaded Building Blocks and offers a consistent interface available on all platforms used by R.

Better still, JJ also authored several pieces demonstrating this new package for the Rcpp Gallery:

All four are interesting and demonstrate different aspects of parallel computing via RcppParallel. But the last article is key. Based on a question by Jim Bullard, and then written with Jim, it shows how a particular matrix distance metric (which is missing from R) can be implemented in a serial manner in both R, and also via Rcpp. The key implementation, however, uses both Rcpp and RcppParallel and thereby achieves a truly impressive speed gain as the gains from using compiled code (via Rcpp) and from using a parallel algorithm (via RcppParallel) are multiplicative! Between JJ's and my four-core machines the gain was between 200 and 300 fold---which is rather considerable. For kicks, I also used a much bigger machine at work which came in at an even larger speed gain (but gains become clearly sublinear as the number of cores increases; there are however some tuning parameters).

So these are exciting times. I am sure there will be lots more to come. For now, head over to the RcppParallel package and start playing. Further contributions to the Rcpp Gallery are not only welcome but strongly encouraged.

/code/snippets | permanent link