Thinking inside the box

MCMC and faster Gibbs Sampling using Rcpp

Sanjog Misra, who uses Rcpp for Monte Carlo Markov Chain (MCMC) analyses in quantitative marketing, kindly set me a short example of Rcpp use.

The example is based on a blog post by Darren Wilkinson which itself discusses and compares the suitability of R, Python, Java or C for MCMC analysis, using the Gibbs sampler as a concrete example. Darren's post is worth checking out: he stresses the rather pragmatic aspects of how fast and/or easy it is to write the code, rather than just the mere runtime. As such, he is not too concerned with a speed advantage of Python over R which he sees at a factor of around 2.4, leaving him to continue to prototype in R. Similarly, with C 'only' being faster than Java by a factor of two, he prefers Java for the numerically more demanding parts.

We do of course advocate the use of Rcpp to combine the best aspects of R and C++, respectively. This Gibbs sampler example provides a nice backdrop. So working with Darren's example, consider the same Gibbs sampler for a bivariate distribution (and apologies for the lack of latex typesetting on my blog)

f(x,y) = k x^2 exp( -x y^2 - y^2 + 2y - 4x)

where the conditional distributions are

f(x|y) = (x^2)*exp(-x*(4+y*y))               ## a Gamma density kernel
f(y|x) = exp(-0.5*2*(x+1)*(y^2 - 2*y/(x+1))  ## a Gaussian kernel

Sanjog then spotted and corrected a small error in the variance expression in Darren's derivation; this is now acknowledged on Darren's website. Full details are in the R script now committed in SVN. The R code for the Gibbs sample can therefore be written as follows below. Note that this uses thinning to minimize serial correlation in the conditional densities--which renders the computation more demanding as 'N times thin' draws have to be generated:

## Here is the actual Gibbs Sampler
## This is Darren Wilkinsons R code (with the corrected variance)
## But we are returning only his columns 2 and 3 as the 1:N sequence
## is never used below
Rgibbs <- function(N,thin) {
    mat <- matrix(0,ncol=2,nrow=N)
    x <- 0
    y <- 0
    for (i in 1:N) {
        for (j in 1:thin) {
            x <- rgamma(1,3,y*y+4)
            y <- rnorm(1,1/(x+1),1/sqrt(2*(x+1)))
        }
        mat[i,] <- c(x,y)
    }
    mat
}

A second variant can be computed using the R bytecode compiler which appeared with the recent release of R 2.13.0 (and which we analysed in this blog post from April). This is as easy as

## We can also try the R compiler on this R function
RCgibbs <- cmpfun(Rgibbs)

Thanks to Rcpp and the inline package, we can also write a C++ variant that can be built and launched from R with ease. The C++ code is assigned to an R text variable gibbscode. (And we used to typeset such code on the blog as a character string, ie in a faint red color---but have now switched to highlight it as if it were freestanding C++ code. It really is passed as a single string to R which then uses the cxxfunction() to compile, link and load a C++ function built around the code. See previous posts on the inline package for more.)

## Now for the Rcpp version -- Notice how easy it is to code up!
gibbscode <- '

  using namespace Rcpp;   // inline does that for us already

  // n and thin are SEXPs which the Rcpp::as function maps to C++ vars
  int N   = as<int>(n);
  int thn = as<int>(thin);

  int i,j;
  NumericMatrix mat(N, 2);

  RNGScope scope;         // Initialize Random number generator

  // The rest of the code follows the R version
  double x=0, y=0;

  for (i=0; i<N; i++) {
    for (j=0; j<thn; j++) {
      x = ::Rf_rgamma(3.0,1.0/(y*y+4));
      y = ::Rf_rnorm(1.0/(x+1),1.0/sqrt(2*x+2));
    }
    mat(i,0) = x;
    mat(i,1) = y;
  }

  return mat;             // Return to R
'
# Compile and Load
RcppGibbs <- cxxfunction(signature(n="int", thin = "int"),
                         gibbscode, plugin="Rcpp")

It is noteworthy how the code logic is essentially identical between the basic R version, and the C++ version. Two nested loops control draws of x from a Gamma distribution, conditional on y, as well as draws of y from a Normal, conditional on x. Add a small amount of parameter passing to obtain the parameters N and thin, allocation of a results matrix and setup of the random number generator state to remain consistent with R, as well as a return of the matrix---and that is all.

As Darren's code uses the GNU GSL in its C variant, I also became interested in seeing how a C/C++ hybrid variant using our RcppGSL package would fare. The code is below.

gslgibbsincl <- '
  #include <gsl/gsl_rng.h>
  #include <gsl/gsl_randist.h>

  using namespace Rcpp;  // just to be explicit
'

gslgibbscode <- '
  int N = as<int>(ns);
  int thin = as<int>(thns);
  int i, j;
  gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937);
  double x=0, y=0;
  NumericMatrix mat(N, 2);
  for (i=0; i<N; i++) {
    for (j=0; j<thin; j++) {
      x = gsl_ran_gamma(r,3.0,1.0/(y*y+4));
      y = 1.0/(x+1)+gsl_ran_gaussian(r,1.0/sqrt(2*x+2));
    }
    mat(i,0) = x;
    mat(i,1) = y;
  }
  gsl_rng_free(r);

  return mat;           // Return to R
'

## Compile and Load
GSLGibbs <- cxxfunction(signature(ns="int", thns = "int"),
                        body=gslgibbscode, includes=gslgibbsincl,
                        plugin="RcppGSL")

This code is similar to the version using just Rcpp, but we need to allocate space for the GSL random generator object (and later release that memory allocation), and we do of course call the GSL functions. This also necessitates declaring the function using the two header files listed as arguments for the include argument of cxxfunction().

So how is the performance? The script (now committed in Rcpp's SVN repo as examples/RcppGibbs/RcppGibbs.R, and also part of the next release) creates the output below when running in non-interactive mode (whereas interactive mode creates a few charts too):

edd@max:~/svn/rcpp/pkg/Rcpp/inst/examples/RcppGibbs$ r RcppGibbs.R
Replication # 1 complete 
Replication # 2 complete 
Replication # 3 complete 
Replication # 4 complete 
                    N=1000 N=5000 N=10000 N=20000
Elasped Time (R)     0.124  2.650  10.511  41.773
Elasped Time (RC)    0.081  1.916   7.594  30.280
Elapsed Time (Rcpp)  0.003  0.076   0.306   1.221
Elapsed Time (Rgsl)  0.002  0.049   0.196   0.784
SpeedUp Rcomp.       1.530  1.380   1.380   1.380
SpeedUp Rcpp        41.330 34.870  34.350  34.210
SpeedUp GSL         62.000 54.080  53.630  53.280

               test replications elapsed  relative user.self sys.self
4  GSLGibbs(N, thn)           10   7.845  1.000000      7.84        0
3 RcppGibbs(N, thn)           10  12.218  1.557425     12.22        0
2   RCgibbs(N, thn)           10 312.296 39.808286    311.98        0
1    Rgibbs(N, thn)           10 420.953 53.658764    420.59        0

The first block is a hand-computed comparison using four sets of parameters; the second block uses the excellent rbenchmark package to compare ten runs at twenty-thousand draws.

We see a nominal increase in performance due to the bytecode compiler, saving roughly 38 percent which seems appropriate given that the R code mostly controls the loops; actual work in undertaken in the already compiled RNG functions. The Rcpp variant is about 34 times faster than the pure R code. Sanjog reported higher increases on his OS X machines (and Darren's post echos a similar order of magnitude). However, on my i7 running Linux I always obtained an improvement in the mid- to high thirties. That is certainly already a rather pleasant result.

What surprised and stunned me at first was that the GSL solution scores an improvement of around 53 times (close to the factor of 60 reported by Darren). A closer look at the code (shown above) makes it clear that there are very few compute-intensive operations outside of the RNG draws. I validated this further with a second study timing just one million draws each from a Gaussian and Gamma using R via Rcpp, and using the GSL. It turns out that the R code is about 2.5 times slower for random draws from the Gamma distribution than the GSL. Inspection of the source code---in files src/nmath/rgamma.c for R and randist/gamma.c for the GSL shows why. R uses a much more complex (and presumably more precise) algorithm. I may follow up with Martin Maechler to see if we can illustrative the numerical benefits of this more expensive approach---this blog entry is getting long enough already.

So to sum up: Gibbs sampling is a somewhat resource-heavy Monte Carlo Markov Chain method for investigating multivariate distributions. R excels at quick and simple explorations, albeit at somewhat slower execution speed. The Rcpp package can help by providing easy means to accelerate simulations by a significant factor. The example discussed here is now in SVN repository for Rcpp and will be part of the next release.

Updated 2011-07-16: Darren has just updated his comparison; fixed two typos here too.

/code/rcpp | permanent link

Thu, 14 Jul 2011

MCMC and faster Gibbs Sampling using Rcpp