Dirk Eddelbuettel Rcpp: Seamless R and C++ Integration
 

Overview

The Rcpp package provides C++ classes that greatly facilitate interfacing C or C++ code in R packages using the .Call() interface provided by R.

Rcpp provides matching C++ classes for a large number of basic R data types. Hence, a package author can keep his data in normal R data structures without having to worry about translation or transfering to C++. At the same time, the data structures can be accessed as easily at the C++ level, and used in the normal manner.

The mapping of data types works in both directions. It is as straightforward to pass data from R to C++, as it is it return data from C++ to R. The following two sections list supported data types.

Transfer from R to C++, and from C++ to R

R data types (SEXP) are matched to C++ objects in a class hierarchy. All R types are supported (vectors, functions, environment, etc ...) and each type is mapped to a dedicated class. For example, numeric vectors are represented as instances of the Rcpp::NumericVector class, environments are represented as instances of Rcpp::Environment, functions are represented as Rcpp::Function, etc ...

The underlying C++ library also offers the Rcpp::wrap function which is a templated function that transforms an arbitrary object into a SEXP. This makes it straightforward to implement C++ logic in terms of standard C++ types such as STL containers and then wrap them when they need to be returned to R. Internally, wrap uses advanced template meta programming techniques and currently supports these data types: primitive types (bool, int, double, size_t, Rbyte, Rcomplex, std::string), STL containers (e.g std::vector) where T is wrappable, STL maps (e.g std::map) where T is wrappable, and arbitrary types that support implicit conversion to SEXP.

The reverse conversion (from R into C++) is performed by the Rcpp::as function template offering a similar degree of flexibility.

New features

Starting with release 0.7.1, a namespace Rcpp is provided. It contains a main class RObject as well as other classes that derive from RObject to deal with environments (ENVSXP) , "Language" for calls (LANGSXP) and the template XPTr for external pointers.

Releases 0.7.2 and later extend this to a number of additional R types along with a number of facilities for automatic conversion thanks to clever use of templates.

Release 0.8.1 adds support for exposing code in C++ directly to R using modules. The corresponding Rcpp-modules vignette has more details.

Release 0.8.3 adds sugar: expression templates that allow compact vectorised expression just like in R but at compiled speed; see the Rcpp-sugar vignette.

Release 0.8.6 adds special functions cherished for statistics: d/p/q/r-style for most relevant distribution, in a form that is very close to what we'd use in R.

Release 0.8.7 adds support for ReferenceClasses in R 2.12.0; this now brings S4-based ReferenceClasses in the OO-style of Java or C++ to the R language.

Release 0.9.0 split support for the legacy classic API into its own package RcppClassic.

Inline use

As of version 0.7.0, Rcpp also contains a modified function 'cfunction' taken from the excellent 'inline' package by Oleg Sklyar. This allows the user to define the body of a C++ function as a standard R character vector -- which is passed to 'cfunction' along with a few other parameters. The function then builds a complete C++ source file containing a function with the given body --- and then compiles, links and loads it for us. Together with the Rcpp interface classes this makes for very easy use of C++ from R --- as everything can be done from the R prompt without any need for Makefiles, configuration settings etc pp.

As of version 0.8.1, an extended function 'cxxfunction' is used (which requiers inline 0.3.5). This function makes it easier to use C++ code with Rcpp. In particular, it enforces use of the .Call interface, adds the Rcpp amespace, and sets up exception forwarding. It employs the macros BEGIN_RCPP and END_RCPP macros to enclose the user code

Moreover, with cfunction (and cxxfunction), we can even call external libraries and have them linked as well.

Several examples of this are included with the packages; one has also been posted on my blog.

This even works on Windows if you have the working 'R tools' installed along with R. See the R-on-Windows FAQ and additional documentation.

Unit testing

As of version 0.9.9, over 750 unit tests called from over 330 unit test functions are included in the package to ensure that no regressions are introduced in terms of API compatibility. The unit tests also serve as a (arguably somewhat raw) form of examples for usage. A vignette is auto-generated with the results of the unit tests.

Usage for package building

Rcpp provides a main header file Rcpp.h and a library inside the installed package in the directory lib. From within R, you can compute the directory location via system.file("lib", "Rcpp.h", package="Rcpp")--but both are provided for your use via the functions Rcpp::RcppCxxFlags() and Rcpp::RcppLdFlags() functions. So we can just use the following as a file src/Makevars (or src/Makevars.win on Windows)
PKG_CXXFLAGS=`${R_HOME}/bin/Rscript -e "Rcpp:::CxxFlags()"`

PKG_LIBS=`${R_HOME}/bin/Rscript -e "Rcpp:::LdFlags()"`
See the help page for Rcpp-package for details.

Also note that starting with version 0.8.0, the 'LinkingTo' argument can also be employed in packages using Rcpp. This will let R determine the location of the header files and users only need to use Rcpp::RcppLdFlags() (as detailed above) to point to the actual library, and this is clearly the recommended approach.

Moreover, we added an entire vignette on how to use Rcpp in your package with a detailed discussion.

Demo package

The RcppExamples package (on CRAN) provides a simple illustration of how to use Rcpp, and can also be used as a framework for deploying Rcpp. This package is however somewhat incomplete in terms of example, so please see below for examples provides by several dozen packages using Rcpp.

Class documentation

We now have Doxygen-generated documentation of all the classes in browseable and searchable html and as a pdf file. We no longer include the Doxygen-generated documentation in the source tarball as it simply too big. But we have zip archives of the html, latex, and man documentation.

Other documentation

Besides the doxygen-generated reference manual we also have these eight vignettes:
  • The Rcpp-introduction vignette provides a short overview of Rcpp and an introduction (and has also been published as Volume 40, Issue 8 of theJournal of Statistical Software),
  • the Rcpp-package vignette shows how to write your own package using Rcpp,
  • the Rcpp-FAQ vignette addresses several frequently asked questions,
  • Rcpp-modules vignette discusses how to expose C++ functions and modules with ease using an idea borrowed from Boost::Python,
  • the Rcpp-extending vignette details the steps needed to extend Rcpp with user-provided or third-party classes,
  • the Rcpp-sugar vignette provides an introduction to the Rcpp sugar features inspired by vectorised R code,
  • the Rcpp-quickref vignette provides a quick reference cheat sheet (but is still mostly incomplete), and
  • the Rcpp-unitTests vignette contains a summary of the (by now over two hundred) units tests for Rcpp.

All vignettes are also installed with the package, and available at the CRAN page.

Google Tech Talk

In late October 2010, the R intergrouplet at Google was kind enough to invite us for a talk on Rcpp. The resulting talk was recorded and is now available on YouTube

Example usage

The following CRAN, R-Forge or BioConductor packages use Rcpp:
  • RQuantLib, an R interface to QuantLib quantitative finance libraries
  • RInside, a set of C++ classes that make it easy to embed R in your C++ applications
  • EarthMoveDist, an implementation of the Earth Move distance metric for R
  • RProtoBuf, an interface from R to the Google ProtoBuf library
  • mvabund, a set of tools for displaying, modeling and analysing multivariate abundance data in community ecology.
  • sdcTable, a package for statistical disclosure control for tabular data.
  • highlight, a syntax highlighting utility based on an R parser that can render to latex and html.
  • bifactorial, a package for global and multiple inference for given bi- and tri-factorial clinical trial designs.
  • RcppExamples, a example package illustrating use of Rcpp and providing concrete examples.
  • RcppArmadillo, an interface from R to the Armadillo C++ linear algebra library using Rcpp.
  • minqa which provides derivative-free optimization by quadratic approximation based on an interface to Fortran implementations by M. J. D. Powell.
  • pcaMethods provides Bayesian PCA, Probabilistic PCA, Nipals PCA, Inverse Non-Linear PCA and the conventional SVD PCA.
  • termstrc offers a wide range of functions for term structure estimation based on static and dynamic coupon bond and yield data.
  • phylobase implements a base S4 class for comparison of phylogenetic structures and data.
  • RSNNS wraps the Stuttgart Neural Network Simulator (SNNS), a library containing many standard implementations of neural networks, and brings these to R.
  • parser implements a detailed source code parser based on the R parser and grammar with a different representation of the parsed expressions.
  • RcppGSL which provides an interface from R to the GNU GSL vector and matrix types.
  • orQA can be used to assess repeatability, accuracy and cross-platform agreement of titration microarray data.
  • RcppDE provides differential evolution optimization (just like (DEoptim which it is based) and serves as a small case study in porting from plain C to the combination of C++ and Rcpp.
  • RcppBDT provides (parts of) Boost Date.Time by using Rcpp modules to easily expose the Boost functionality to R.
  • unmarked uses Rcpp and RcppArmadillo to provide code to fit hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling.
  • The simFrame package provides an object-oriented general framework for statistical simulations.
  • The rgam package also uses Rcpp and RcppArmadillo to provide an outlier-robust fit for Generalized Additive (gam) Models.
  • The spacodiR package implements an interface to SPACoDi which is primarily designed to characterise the structure and phylogenetic diversity of communities using abundance or presence-absence data of species among community plots.
  • The VIM package provides visualization for missing values.
  • NetworkAnalysis provides statistical inference on populations of weighted or unweighted networks.
  • The SBSA package uses RcppArmadillo to provides functions for simplified Bayesian sensitivity analysis.
  • GUTS contains functions for the fast calculation of the likelihood of a stochastic survival model.
  • FABIA implements a model-based technique for biclustering, that is clustering rows and columns simultaneously. Biclusters are found by factor analysis where both the factors and the loading matrix are sparse.
  • The wordcloud package uses Rcpp to accelerate rendering word clouds from text.
  • auteur implements a Bayesian sampler of the trait-evolutionary process to identify shifts in process of continuous-trait evolution on phylogenetic trees.
  • The cds pakages uses Rcpp modules and RcppArmadillo to model coupled dipole approximations: Given a set of ellipsoidal nanoparticles, it calculates the polarizability tensor for the dipoles associated with each particle, and solves the coupled-dipole equations by direct inversion of the interaction matrix.
  • The planar pakages uses Rcpp modules and RcppArmadillo to solves the electromagnetic problem of reflection and transmission at a multilayer planar interface. Also computes the decay rates for a dipolar emitter near a multilayer structure.
  • The maxent package provides tools for text classification using multinomial logistic regression, also known as maximum entropy. The focus of this maximum entropy classifier is to minimize memory consumption on very large datasets, particularly sparse document-term matrices represented by the tm package.
  • fdaMixed offers functional data analysis in a mixed-model framework via a likelihood-based analysis; it uses Rcpp and RcppArmadillo.
  • KernSmoothIRT fits nonparametric item and option characteristic curves using kernel smoothing, and allows for optimal selection of the smoothing bandwidth using cross-validation and a variety of exploratory plotting tools.
  • The rugarch package can estimate a variety of univariate GARCH models including ARFIMA, in-mean effects, use of external regressors and various other GARCH flavours using both Rcpp and RcppArmadillo.
  • bcp provides an implementation of an approximation to the product partition model for the normal errors change point problem using Markov Chain Monte Carlo, and also extends the methodology to independent multivariate series with an assumed common change point structure.
  • RVowpalWabbit provides an interface to the Vowpal Wabbit fast on-line learner by John Langford et al.
  • The rococo package provides a robust gamma rank correlation coefficient along with a permutation-based rank correlation test both of which are explicitly designed for dealing with noisy numerical data.
  • The LaF package provides methods to efficiently access data from large ascii files, including subsetting and block-wise access.
  • The ANN package implements a feedforward Artificial Neural Network (ANN) optimized by Genetic Algorithm (GA), using the Rcpp and RcppClassic packages.
  • The Rclusterpp package provides flexible native clustering routines that can be linked against in downstream packages, and uses Rcpp and RcppEigen.
  • The bfa package provides model fitting for several Bayesian factor models including Gaussian, ordinal probit, mixed and semiparametric Gaussian copula factor models; it uses Rcpp and RcppArmadillo.
  • The nfda package implements nonparametric functional data analysis; it also uses Rcpp and RcppArmadillo.
  • RSofia provides an R interface to the sofia-ml suite of fast incremental algorithms for machine learning suitable for training models for classification or ranking.
  • The fastGHQuad package implements functions for fast (and numerically stable) Gauss-Hermite quadrature.
  • The SpatialTools package provides tools for spatial analysis with an emphasis on kriging using Rcpp and RcppArmadillo.
  • acer implements the ACER method for extreme value estimation which finds return levels of extreme values.
  • The psgp package provides projected spatial gaussian process methods for sparse spatial kriging; it uses Rcpp and RcppArmadillo.

History

Rcpp was initially written by Dominick Samperi to ease contributions to the RQuantLib project, and then released as a project in its own right. During 2006, Dominick made several releases under the RCpp name (versions 1.0 to 1.4) before he changed the name to RCppTemplate and made more releases (1.5 to 5.2). His project saw no public releases for the thirty-five months period from November 2006 to November 2009.

As a user of Rcpp, I (Dirk) chose to adopt Rcpp during 2008, made a first release 0.6.0 in November 2008 and have made a number of new releases since -- see the ChangeLog for details. Rcpp is open for contributions and patches some of which have already been integrated.

Romain Francois joined the effort just before the 0.7.0 release and brought along a lot of energy and new ideas. We now have a mailing list for discussions around Rcpp. If you have ideas or suggested changes, send an email there.

Download

A local archive is available here and at CRAN; SVN access is provided at R-Forge.

License

Rcpp is licensed under the GNU GPL version 2 or later.

Last modified: Tue Jan 24 07:32:23 CST 2012