The example is based on a blog post by Darren Wilkinson which itself discusses and compares the suitability of R, Python, Java or C for MCMC analysis, using the Gibbs sampler as a concrete example. Darren's post is worth checking out: he stresses the rather pragmatic aspects of how fast and/or easy it is to write the code, rather than just the mere runtime. As such, he is not too concerned with a speed advantage of Python over R which he sees at a factor of around 2.4, leaving him to continue to prototype in R. Similarly, with C 'only' being faster than Java by a factor of two, he prefers Java for the numerically more demanding parts.
We do of course advocate the use of Rcpp to combine the best aspects of R and C++, respectively. This Gibbs sampler example provides a nice backdrop. So working with Darren's example, consider the same Gibbs sampler for a bivariate distribution (and apologies for the lack of latex typesetting on my blog)
where the conditional distributions aref(x,y) = k x^2 exp( -x y^2 - y^2 + 2y - 4x)
Sanjog then spotted and corrected a small error in the variance expression in Darren's derivation; this is now acknowledged on Darren's website. Full details are in the R script now committed in SVN. The R code for the Gibbs sample can therefore be written as follows below. Note that this uses thinning to minimize serial correlation in the conditional densities--which renders the computation more demanding as 'N times thin' draws have to be generated:f(x|y) = (x^2)*exp(-x*(4+y*y)) ## a Gamma density kernel f(y|x) = exp(-0.5*2*(x+1)*(y^2 - 2*y/(x+1)) ## a Gaussian kernel
A second variant can be computed using the R bytecode compiler which appeared with the recent release of R 2.13.0 (and which we analysed in this blog post from April). This is as easy as## Here is the actual Gibbs Sampler ## This is Darren Wilkinsons R code (with the corrected variance) ## But we are returning only his columns 2 and 3 as the 1:N sequence ## is never used below Rgibbs <- function(N,thin) { mat <- matrix(0,ncol=2,nrow=N) x <- 0 y <- 0 for (i in 1:N) { for (j in 1:thin) { x <- rgamma(1,3,y*y+4) y <- rnorm(1,1/(x+1),1/sqrt(2*(x+1))) } mat[i,] <- c(x,y) } mat }
Thanks to Rcpp and the inline package, we can also write a C++ variant that can be built and launched from R with ease. The C++ code is assigned to an R text variable## We can also try the R compiler on this R function RCgibbs <- cmpfun(Rgibbs)
gibbscode
. (And we used to typeset such
code on the blog as a character string, ie in a faint red color---but have now switched to highlight it as if it were freestanding C++
code. It really is passed as a single string to R which then uses the cxxfunction()
to compile, link and load a C++ function
built around the code. See previous posts on the inline package for more.)
It is noteworthy how the code logic is essentially identical between the basic R version, and the C++ version. Two nested loops control draws of x from a Gamma distribution, conditional on y, as well as draws of y from a Normal, conditional on x. Add a small amount of parameter passing to obtain the parameters## Now for the Rcpp version -- Notice how easy it is to code up! gibbscode <- ' using namespace Rcpp; // inline does that for us already // n and thin are SEXPs which the Rcpp::as function maps to C++ vars int N = as<int>(n); int thn = as<int>(thin); int i,j; NumericMatrix mat(N, 2); RNGScope scope; // Initialize Random number generator // The rest of the code follows the R version double x=0, y=0; for (i=0; i<N; i++) { for (j=0; j<thn; j++) { x = ::Rf_rgamma(3.0,1.0/(y*y+4)); y = ::Rf_rnorm(1.0/(x+1),1.0/sqrt(2*x+2)); } mat(i,0) = x; mat(i,1) = y; } return mat; // Return to R ' # Compile and Load RcppGibbs <- cxxfunction(signature(n="int", thin = "int"), gibbscode, plugin="Rcpp")
N
and thin
, allocation of a results matrix and setup of the random number generator state
to remain consistent with R, as well as a return of the matrix---and that is all.
As Darren's code uses the GNU GSL in its C variant, I also became interested in seeing how a C/C++ hybrid variant using our RcppGSL package would fare. The code is below.
This code is similar to the version using just Rcpp, but we need to allocate space for the GSL random generator object (and later release that memory allocation), and we do of course call the GSL functions. This also necessitates declaring the function using the two header files listed as arguments for thegslgibbsincl <- ' #include <gsl/gsl_rng.h> #include <gsl/gsl_randist.h> using namespace Rcpp; // just to be explicit ' gslgibbscode <- ' int N = as<int>(ns); int thin = as<int>(thns); int i, j; gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937); double x=0, y=0; NumericMatrix mat(N, 2); for (i=0; i<N; i++) { for (j=0; j<thin; j++) { x = gsl_ran_gamma(r,3.0,1.0/(y*y+4)); y = 1.0/(x+1)+gsl_ran_gaussian(r,1.0/sqrt(2*x+2)); } mat(i,0) = x; mat(i,1) = y; } gsl_rng_free(r); return mat; // Return to R ' ## Compile and Load GSLGibbs <- cxxfunction(signature(ns="int", thns = "int"), body=gslgibbscode, includes=gslgibbsincl, plugin="RcppGSL")
include
argument of
cxxfunction()
.
So how is the performance?
The script (now committed in Rcpp's SVN repo as
examples/RcppGibbs/RcppGibbs.R
, and also part of the next release) creates the output below when running in non-interactive
mode (whereas interactive mode creates a few charts too):
The first block is a hand-computed comparison using four sets of parameters; the second block uses the excellent rbenchmark package to compare ten runs at twenty-thousand draws.edd@max:~/svn/rcpp/pkg/Rcpp/inst/examples/RcppGibbs$ r RcppGibbs.R Replication # 1 complete Replication # 2 complete Replication # 3 complete Replication # 4 complete N=1000 N=5000 N=10000 N=20000 Elasped Time (R) 0.124 2.650 10.511 41.773 Elasped Time (RC) 0.081 1.916 7.594 30.280 Elapsed Time (Rcpp) 0.003 0.076 0.306 1.221 Elapsed Time (Rgsl) 0.002 0.049 0.196 0.784 SpeedUp Rcomp. 1.530 1.380 1.380 1.380 SpeedUp Rcpp 41.330 34.870 34.350 34.210 SpeedUp GSL 62.000 54.080 53.630 53.280 test replications elapsed relative user.self sys.self 4 GSLGibbs(N, thn) 10 7.845 1.000000 7.84 0 3 RcppGibbs(N, thn) 10 12.218 1.557425 12.22 0 2 RCgibbs(N, thn) 10 312.296 39.808286 311.98 0 1 Rgibbs(N, thn) 10 420.953 53.658764 420.59 0
We see a nominal increase in performance due to the bytecode compiler, saving roughly 38 percent which seems appropriate given that the R code mostly controls the loops; actual work in undertaken in the already compiled RNG functions. The Rcpp variant is about 34 times faster than the pure R code. Sanjog reported higher increases on his OS X machines (and Darren's post echos a similar order of magnitude). However, on my i7 running Linux I always obtained an improvement in the mid- to high thirties. That is certainly already a rather pleasant result.
What surprised and stunned me at first was that the GSL solution scores an improvement of around 53 times (close to the factor of 60
reported by Darren). A closer look at the code (shown above) makes it clear that there are very few compute-intensive operations outside of
the RNG draws. I validated this further with a second study timing just one million draws each from a Gaussian and Gamma using R via Rcpp,
and using the GSL. It turns out that the R code is about 2.5 times slower for random draws from the Gamma distribution than the GSL. Inspection of the
source code---in files src/nmath/rgamma.c
for R and randist/gamma.c
for the GSL shows why. R uses a much more complex
(and presumably more precise) algorithm. I may follow up with Martin Maechler to see if we can illustrative the numerical benefits of this
more expensive approach---this blog entry is getting long enough already.
So to sum up: Gibbs sampling is a somewhat resource-heavy Monte Carlo Markov Chain method for investigating multivariate distributions. R excels at quick and simple explorations, albeit at somewhat slower execution speed. The Rcpp package can help by providing easy means to accelerate simulations by a significant factor. The example discussed here is now in SVN repository for Rcpp and will be part of the next release.
Updated 2011-07-16: Darren has just updated his comparison; fixed two typos here too.