Thu, 11 Mar 2010

RcppArmadillo 0.1.0

Besides the new RcppExamples, another new package RcppArmadillo got spun out of Rcpp with the recent release 0.7.8 of Rcpp.

Romain and I already had an example of a simple but fast linear model fit using the (very clever) Armadillo C++ library by Conrad Sanderson. In fact, I had used this as a motivational example of why Rcpp rocks in a recent talk to the ACM chapter at U of Chicago which, thanks to David Smith at REvo, got some further exposure.

Now this example is more refined as further glue got added. Given that both Armadillo and Rcpp make use of C++ templates, the actual amount of code in RcppArmadillo is not that large: just over 200 lines in a header file, and a little less for some testing accessor and example functions in a source file. And this makes for some really nice example code: the 'fast regression' example becomes this (where I simply removed two blocks with conditional on the Armadillo version):

#include <RcppArmadillo.h>

extern "C" SEXP fastLm(SEXP ys, SEXP Xs) {

    Rcpp::NumericVector yr(ys);			// creates Rcpp vector from SEXP
    Rcpp::NumericMatrix Xr(Xs);			// creates Rcpp matrix from SEXP
    int n = Xr.nrow(), k = Xr.ncol();

    arma::mat X(Xr.begin(), n, k, false);   	// reuses memory and avoids extra copy
    arma::colvec y(yr.begin(), yr.size(), false);

    arma::colvec coef = solve(X, y);            // fit model y ~ X
    arma::colvec resid = y - X*coef; 		// residuals
    double sig2 = arma::as_scalar( trans(resid)*resid/(n-k) );
    						// std.error of estimate
    arma::colvec stderrest = sqrt( sig2 * diagvec( arma::inv(arma::trans(X)*X)) );

    Rcpp::Pairlist res(Rcpp::Named( "coefficients", coef),
                       Rcpp::Named( "stderr", stderrest));
    return res;

No extra copies! Armadillo instantiates directly from the underlying R objects for the vector and matrix, solves the regression equations, computes the standard error of the estimates and returns the two vectors. Leaving us to write about eleven lines of code. Moreover, as Armadillo is well designed and uses template meta-programming to avoid extra copies (see these lecture notes for details), it is about as efficient as it can be (and will use Atlas or other BLAS where available).

And, this is just one example. Rcpp should be suitable for other C++ libraries, and provides an easy to use seamless interface between C++ and R.

However, we should note that (at about the last minute) we found out about some unit test failures in OS X as well as some issues in a Debian chroot -- cran2deb ran into some build issues on i386 and amd64 in the testing chroot even this 'it all works' swimmingly on our Debian, Ubuntu and Fedora build environments. A follow-up with fixes for either Rcpp and/or RcppArmadillo appears likely.

Update: The build issues seems to be with 64-bit systems and everything appears cool in 32-bit.

/code/rcpp | permanent link