Dirk Eddelbuettel Rcpp: Seamless R and C++ Integration
 

Overview

The Rcpp package provides C++ classes that greatly facilitate interfacing C or C++ code in R packages using the .Call() interface provided by R.

Rcpp provides matching C++ classes for a large number of basic R data types. Hence, a package author can keep his data in normal R data structures without having to worry about translation or transfering to C++. At the same time, the data structures can be accessed as easily at the C++ level, and used in the normal manner.

The mapping of data types works in both directions. It is as straightforward to pass data from R to C++, as it is it return data from C++ to R. The following two sections list supported data types.

Transfer from R to C++, and from C++ to R

R data types (SEXP) are matched to C++ objects in a class hierarchy. All R types are supported (vectors, functions, environment, etc ...) and each type is mapped to a dedicated class. For example, numeric vectors are represented as instances of the Rcpp::NumericVector class, environments are represented as instances of Rcpp::Environment, functions are represented as Rcpp::Function, etc ...

The underlying C++ library also offers the Rcpp::wrap function which is a templated function that transforms an arbitrary object into a SEXP. This makes it straightforward to implement C++ logic in terms of standard C++ types such as STL containers and then wrap them when they need to be returned to R. Internally, wrap uses advanced template meta programming techniques and currently supports these data types: primitive types (bool, int, double, size_t, Rbyte, Rcomplex, std::string), STL containers (e.g std::vector) where T is wrappable, STL maps (e.g std::map) where T is wrappable, and arbitrary types that support implicit conversion to SEXP.

The reverse conversion (from R into C++) is performed by the Rcpp::as function template offering a similar degree of flexibility.

New features

Starting with release 0.7.1, a namespace Rcpp is provided. It contains a main class RObject as well as other classes that derive from RObject to deal with environments (ENVSXP) , "Language" for calls (LANGSXP) and the template XPTr for external pointers.

Releases 0.7.2 and later extend this to a number of additional R types along with a number of facilities for automatic conversion thanks to clever use of templates.

Release 0.8.1 adds support for exposing code in C++ directly to R using modules. The corresponding Rcpp-modules vignette has more details.

Release 0.8.3 adds sugar: expression templates that allow compact vectorised expression just like in R but at compiled speed; see the Rcpp-sugar vignette.

Release 0.8.6 adds special functions cherished for statistics: d/p/q/r-style for most relevant distribution, in a form that is very close to what we'd use in R.

Release 0.8.7 adds support for ReferenceClasses in R 2.12.0; this now brings S4-based ReferenceClasses in the OO-style of Java or C++ to the R language.

Release 0.9.0 split support for the legacy classic API into its own package RcppClassic.

Release 0.10.0 bring Rcpp attributes, enhanced modules support and more.

Inline use

As of version 0.7.0, Rcpp also contains a modified function 'cfunction' taken from the excellent 'inline' package by Oleg Sklyar. This allows the user to define the body of a C++ function as a standard R character vector -- which is passed to 'cfunction' along with a few other parameters. The function then builds a complete C++ source file containing a function with the given body --- and then compiles, links and loads it for us. Together with the Rcpp interface classes this makes for very easy use of C++ from R --- as everything can be done from the R prompt without any need for Makefiles, configuration settings etc pp.

As of version 0.8.1, an extended function 'cxxfunction' is used (which requiers inline 0.3.5). This function makes it easier to use C++ code with Rcpp. In particular, it enforces use of the .Call interface, adds the Rcpp amespace, and sets up exception forwarding. It employs the macros BEGIN_RCPP and END_RCPP macros to enclose the user code

Moreover, with cfunction (and cxxfunction), we can even call external libraries and have them linked as well.

Several examples of this are included with the packages; one has also been posted on my blog.

This even works on Windows if you have the working 'R tools' installed along with R. See the R-on-Windows FAQ and additional documentation.

With version 0.10.0, this has been complemented by Rcpp attributes which is even easier and more powerful than inline --- see the corresponding vignette for details.

Unit testing

As of version 0.10.2, over 870 unit tests called from over 390 unit test functions are included in the package to ensure that no regressions are introduced in terms of API compatibility. The unit tests also serve as a (arguably somewhat raw) form of examples for usage. A vignette is auto-generated with the results of the unit tests.

Usage for package building

Rcpp provides a main header file Rcpp.h and a library inside the installed package in the directory lib. From within R, you can compute the directory location via system.file("lib", "Rcpp.h", package="Rcpp")--but both are provided for your use via the functions Rcpp::RcppCxxFlags() and Rcpp::RcppLdFlags() functions. So we can just use the following as a file src/Makevars (or src/Makevars.win on Windows)
PKG_CXXFLAGS=`${R_HOME}/bin/Rscript -e "Rcpp:::CxxFlags()"`

PKG_LIBS=`${R_HOME}/bin/Rscript -e "Rcpp:::LdFlags()"`
See the help page for Rcpp-package for details.

Also note that starting with version 0.8.0, the 'LinkingTo' argument can also be employed in packages using Rcpp. This will let R determine the location of the header files and users only need to use Rcpp::RcppLdFlags() (as detailed above) to point to the actual library, and this is clearly the recommended approach.

Moreover, we added an entire vignette on how to use Rcpp in your package with a detailed discussion.

Demo package

The RcppExamples package (on CRAN) provides a simple illustration of how to use Rcpp, and can also be used as a framework for deploying Rcpp. This package is however somewhat incomplete in terms of example, so please see below for examples provides by several dozen packages using Rcpp.

Class documentation

We now have Doxygen-generated documentation of all the classes in browseable and searchable html and as a pdf file. We no longer include the Doxygen-generated documentation in the source tarball as it simply too big. But we have zip archives of the html, latex, and man documentation.

Other documentation

Besides the doxygen-generated reference manual we also have these eight vignettes:
  • The Rcpp-introduction vignette provides a short overview of Rcpp and an introduction (and has also been published as Volume 40, Issue 8 of theJournal of Statistical Software),
  • the Rcpp-package vignette shows how to write your own package using Rcpp,
  • the Rcpp-FAQ vignette addresses several frequently asked questions,
  • Rcpp-modules vignette discusses how to expose C++ functions and modules with ease using an idea borrowed from Boost::Python,
  • the Rcpp-extending vignette details the steps needed to extend Rcpp with user-provided or third-party classes,
  • the Rcpp-sugar vignette provides an introduction to the Rcpp sugar features inspired by vectorised R code,
  • the Rcpp-attributes vignette introduces the attributes features for getting C++ into R with ease,
  • the Rcpp-quickref vignette provides a quick reference cheat sheet (but is still mostly incomplete),
  • the Rcpp-attributes vignette details the high-level syntax for declaring C++ functions as callable from R and shows how to automatically generate the code required to invoke them, and
  • the Rcpp-unitTests vignette contains a summary of the (by now over two hundred) units tests for Rcpp.

All vignettes are also installed with the package, and available at the CRAN page.

Google Tech Talk

In late October 2010, the R intergrouplet at Google was kind enough to invite us for a talk on Rcpp. The resulting talk was recorded and is now available on YouTube

Example usage

The following CRAN, R-Forge or BioConductor packages use Rcpp:
  • RQuantLib, an R interface to QuantLib quantitative finance libraries
  • RInside, a set of C++ classes that make it easy to embed R in your C++ applications
  • EarthMoveDist, an implementation of the Earth Move distance metric for R
  • RProtoBuf, an interface from R to the Google ProtoBuf library
  • mvabund, a set of tools for displaying, modeling and analysing multivariate abundance data in community ecology.
  • sdcTable, a package for statistical disclosure control for tabular data.
  • bifactorial, a package for global and multiple inference for given bi- and tri-factorial clinical trial designs.
  • RcppExamples, a example package illustrating use of Rcpp and providing concrete examples.
  • RcppArmadillo, an interface from R to the Armadillo C++ linear algebra library using Rcpp.
  • minqa which provides derivative-free optimization by quadratic approximation based on an interface to Fortran implementations by M. J. D. Powell.
  • pcaMethods provides Bayesian PCA, Probabilistic PCA, Nipals PCA, Inverse Non-Linear PCA and the conventional SVD PCA.
  • termstrc offers a wide range of functions for term structure estimation based on static and dynamic coupon bond and yield data.
  • phylobase implements a base S4 class for comparison of phylogenetic structures and data.
  • RSNNS wraps the Stuttgart Neural Network Simulator (SNNS), a library containing many standard implementations of neural networks, and brings these to R.
  • parser implements a detailed source code parser based on the R parser and grammar with a different representation of the parsed expressions.
  • RcppGSL which provides an interface from R to the GNU GSL vector and matrix types.
  • orQA can be used to assess repeatability, accuracy and cross-platform agreement of titration microarray data.
  • RcppDE provides differential evolution optimization (just like (DEoptim which it is based) and serves as a small case study in porting from plain C to the combination of C++ and Rcpp.
  • RcppBDT provides (parts of) Boost Date.Time by using Rcpp modules to easily expose the Boost functionality to R.
  • unmarked uses Rcpp and RcppArmadillo to provide code to fit hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling.
  • The simFrame package provides an object-oriented general framework for statistical simulations.
  • The rgam package also uses Rcpp and RcppArmadillo to provide an outlier-robust fit for Generalized Additive (gam) Models.
  • The spacodiR package implements an interface to SPACoDi which is primarily designed to characterise the structure and phylogenetic diversity of communities using abundance or presence-absence data of species among community plots.
  • The VIM package provides visualization for missing values.
  • NetworkAnalysis provides statistical inference on populations of weighted or unweighted networks.
  • The SBSA package uses RcppArmadillo to provides functions for simplified Bayesian sensitivity analysis.
  • GUTS contains functions for the fast calculation of the likelihood of a stochastic survival model.
  • The wordcloud package uses Rcpp to accelerate rendering word clouds from text.
  • auteur implements a Bayesian sampler of the trait-evolutionary process to identify shifts in process of continuous-trait evolution on phylogenetic trees.
  • The cda pakages uses Rcpp modules and RcppArmadillo to model coupled dipole approximations: Given a set of ellipsoidal nanoparticles, it calculates the polarizability tensor for the dipoles associated with each particle, and solves the coupled-dipole equations by direct inversion of the interaction matrix.
  • The planar pakages uses Rcpp modules and RcppArmadillo to solves the electromagnetic problem of reflection and transmission at a multilayer planar interface. Also computes the decay rates for a dipolar emitter near a multilayer structure.
  • The maxent package provides tools for text classification using multinomial logistic regression, also known as maximum entropy. The focus of this maximum entropy classifier is to minimize memory consumption on very large datasets, particularly sparse document-term matrices represented by the tm package.
  • fdaMixed offers functional data analysis in a mixed-model framework via a likelihood-based analysis; it uses Rcpp and RcppArmadillo.
  • KernSmoothIRT fits nonparametric item and option characteristic curves using kernel smoothing, and allows for optimal selection of the smoothing bandwidth using cross-validation and a variety of exploratory plotting tools.
  • The rugarch package can estimate a variety of univariate GARCH models including ARFIMA, in-mean effects, use of external regressors and various other GARCH flavours using both Rcpp and RcppArmadillo.
  • bcp provides an implementation of an approximation to the product partition model for the normal errors change point problem using Markov Chain Monte Carlo, and also extends the methodology to independent multivariate series with an assumed common change point structure.
  • RVowpalWabbit provides an interface to the Vowpal Wabbit fast on-line learner by John Langford et al.
  • The rococo package provides a robust gamma rank correlation coefficient along with a permutation-based rank correlation test both of which are explicitly designed for dealing with noisy numerical data.
  • The LaF package provides methods to efficiently access data from large ascii files, including subsetting and block-wise access.
  • The Rclusterpp package provides flexible native clustering routines that can be linked against in downstream packages, and uses Rcpp and RcppEigen.
  • The bfa package provides model fitting for several Bayesian factor models including Gaussian, ordinal probit, mixed and semiparametric Gaussian copula factor models; it uses Rcpp and RcppArmadillo.
  • RSofia provides an R interface to the sofia-ml suite of fast incremental algorithms for machine learning suitable for training models for classification or ranking.
  • The fastGHQuad package implements functions for fast (and numerically stable) Gauss-Hermite quadrature.
  • The SpatialTools package provides tools for spatial analysis with an emphasis on kriging using Rcpp and RcppArmadillo.
  • acer implements the ACER method for extreme value estimation which finds return levels of extreme values.
  • RcppSMC implements several Sequential Monte Carlo / Particle Filter models using the SMC template library by Adam Johansen.
  • The psgp package provides projected spatial gaussian process methods for sparse spatial kriging; it uses Rcpp and RcppArmadillo.
  • phom computes persistent homology of filtered simplicial complexes, and provides facilities for constructing complexes from geometric data.
  • The BioConductor package GRENITS uses Rcpp and RcppArmadillo to implement network inference statistical models using Dynamic Bayesian Networks and Gibbs Variable Selection.
  • The BioConductor package mosaics provides functions for fitting MOSAiCS, a statistical framework to analyze one-sample or two-sample ChIP-seq data.
  • The BioConductor package mzR provides a unified API to the common file formats and parsers available for mass spectrometry data.
  • The WideLm package uses Rcpp as well the NVidia CUDA API (>= 4.1) to simultaneously estimate a large number of 'tall and skinny' models from the same dataset.
  • The forecast package provides methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling; it uses Rcpp and RcppArmadillo.
  • The multmod package implements functions for testing of multiple outcomes using i.i.d. decompositions.
  • The openair package provides tools to analyse, interpret and understand air pollution data, typically from hourly time series and both monitoring data and dispersion model output can be analysed.
  • The Rmixmod package implements high-performance model-based cluster analysis for mixture modelling.
  • The sdcMicro package contains statistical disclosure control methods for the generation of public- and scientific-use files and can be used for the generation of anonymized (micro)data, i.e. for the generation of public- and scientific-use files.
  • The BioConductor package Rdisop uses Rcpp and RcppClassic for the decomposition of isotopic patterns.
  • The Rmalchains package implements an algorithm family for continuous optimization called memetic algorithms with local search chains (MA-LS-Chains); memetic algorithms are hybridizations of genetic algorithms with local search methods.
  • The growcurves package provides Bayesian semiparametric growth curve models that additionally include multiple membership random effects, using both Rcpp and RcppArmadillo.
  • The apcluster package implements Frey's and Dueck's Affinity Propagation clustering algorithm in R, and also provides an algorithm for exemplar-based agglomerative clustering that can also be used to join clusters obtained from affinity propagation.
  • The survSNP package provides power and sample size calculations for SNP association studies with right censored time to event outcomes.
  • The robustHD package provides robust methods for high-dimensional data, in particular linear model selection techniques based on least angle regression and sparse regression; it uses RcppArmadillo.
  • The sparseLTSEigen package implements an RcppEigen-based back-end for sparse least trimmed squares regression with an L1 penalty; it uses RcppEigen.
  • The waffect package simulates phenotypic (case or control) datasets under a disease model H1 such that the total number of cases is constant across all the simulations.
  • The zic package implements Bayesian inference for zero-inflated count models using MCMC written in C++; the package uses Rcpp and RcppArmadillo.
  • The rcppbugs package provides an R bindings to the CppBugs C++ library for MCMC and aims to make writing mcmc models as painless as possible by incorporating features from both WinBugs and PyMC. It uses both Rcpp and RcppArmadillo.
  • The mirt package implements multidimensional item response theory for the analysis of dichotomous and polychotomous response data using latent trait models under the Item Response Theory paradigm.
  • The mets package helps with the analysis of multivariate event times by implementing various statistical models for multivariate event history data, including multivariate cumulative incidence models, and bivariate random effects probit models (liability models).
  • The bfp package implements the Bayesian paradigm for fractional polynomial models under the assumption of normally distributed error terms.
  • The gof package implements model-checking techniques for generalized linear models and linear structural equation models based on cumulative residuals; it uses Rcpp and RcppArmadillo.
  • The RcppOctave package provides a bidirectional interface to GNU Octave, allowing R to call Octave functions and script files.
  • The blockcluster package provides co-clustering for Binary, contingency and continuous utility functions to visualize the results.
  • The RcppCNPy package uses Carl Rogers to read / write files created by / for Numeric Python (aka "numpy").
  • The MVB package fits log-linear models for multivariate Bernoulli distributions with mixed effect models and LASSO.
  • The surveillance package provides statistical methods for modeling and change-point detection in time series of counts, proportions and categorical data for temporal and spatio-temporal modeling and monitoring of epidemic phenomena.
  • The fugeR package provides "FUzzy GEnetic" machine learning for prediction models.
  • classify provides classification accuracy under IRT models.
  • The ccaPP package implements robust canonical correlation analysis via projection pursuit; it uses Rcpp and RcppArmadillo.
  • trustOptim provides a trust region algorithm for nonlinear minimization with methods that are designed to be efficient when the Hessian is sparse; it uses Rcpp and RcppEigen.
  • The tmg package implements truncated multivariate gaussian sampling using Hamiltonian Monte Carlo where the truncation is defined using linear and/or quadratic polynomials; it uses Rcpp and RcppEigen.
  • The mRMRe package implements parallelized mRMR ensemble feature selection to compute mutual information matrices from continuous, categorical and survival variables; it also contains a function to perform feature selection with mRMR and a new ensemble mRMR technique.
  • The clusteval package provides a suite of tools to evaluate clustering algorithms, clusterings, and individual clusters.
  • oem implements orthogonolizing expectations maximisation to fit penalized regression; it uses Rcpp and RcppArmadillo.
  • The quadrupen package fits classical sparse regression models with efficient active set algorithms by solving quadratic problems and also provides a few methods for model selection purpose (cross-validation, stability selection); it uses Rcpp and RcppArmadillo.
  • The pbdBASE package implements methods and classes for distributed data types using MPI, and the pbdDMAT provides distributed linear algebra computations; both are part of a set of packages for Programming with Big Data.
  • The EpiContactTrace packages provides routines for epidemiological contact tracing and visualisation of network of contacts.
  • The transmission package simulates and fits continuous time infectious disease transmission models.
  • The Rchemcpp package compares sets of molecules and returns a similarity matrix based on the chemcpp library; it uses uses Rcpp and RcppClassic.
  • The robustgam package implements robust estimation for generalized additive models by implementing the fast and stable algorithm in Wong, Yao and Lee (2012).
  • The sparseHessianFD package computes the sparse Hessian using ACM TOMS Algorithm 636; it uses Rcpp and RcppEigen.
  • The gMWT package provides generalized Mann-Whitney type tests based on probabilistic indices; it uses Rcpp and RcppArmadillo.
  • The ngspatial package provides tools for analyzing spatial data, especially non-Gaussian areal data; it use Rcpp and RcppArmadillo.
  • The surveillance package implements tools for temporal and spatio-temporal modeling and monitoring of epidemic phenomena.
  • The GeneticTools package contains a collection of routines for the analysis of expression and genotype data, it uses Rcpp and RcppArmadillo.
  • RcppClassicExamples regroups examples from the deprecated initial API now provided by RcppClassic.
  • The jaatha package provides a fast parameter estimation method for evolutionary biology.
  • The ConConPiWiFun package implements continuous convex piecewise linear functions which are useful for large class of optimization problems.
  • The RcppRoll package supplies fast functions for rolling over vectors and matrices, and provides utility functions 'rollit' and 'rollit_raw' as an interface for generating C++ backed rolling functions; it uses Rcpp and RcppArmadillo.
  • rforensicbatwing calculates forensic trace-suspect match probabilities using a modified version of Ian Wilson's BATWING program.
  • The RcppXts package facilitates access to the C API functions of xts from Rcpp.
  • The stochvol package implements efficient algorithms for fully Bayesian estimation of stochastic volatility (SV) models via MCMC.
  • The marked provides a framework for handling data and analysis for mark-recapture.
  • The RMessenger package R with access to the instant messaging protocol XMPP using an embedded copy of libstrophe.
  • The PReMiuN package implements Dirichlet pricess Bayesian clustering also known as profile regression.
  • The ALKr package provides several algorithms for generating age-length keys for fish population from incomplete data.
  • The ecp package computes hierarchical change point analysis through the use of the energy statistic for multiple change point analysis of multivariate data.
  • The ExactNumCI packages computes exact confidence interval for binomial proportions.
  • The rexpokit packages implements wrappers for EXPOKIT, a Fortran library for matrix exponentiation.
  • The amelia package for missing data imputation now uses Rcpp too. exponentiation.

History

Rcpp was initially written by Dominick Samperi to ease contributions to the RQuantLib project, and then released as a project in its own right. During 2006, Dominick made several releases under the RCpp name (versions 1.0 to 1.4) before he changed the name to RCppTemplate and made more releases (1.5 to 5.2). His project saw no public releases for the thirty-five months period from November 2006 to November 2009.

As a user of Rcpp, I (Dirk) chose to adopt Rcpp during 2008, made a first release 0.6.0 in November 2008 and have made a number of new releases since -- see the ChangeLog for details. Rcpp is open for contributions and patches some of which have already been integrated.

Romain Francois joined the effort just before the 0.7.0 release and brought along a lot of energy and new ideas. We now have a mailing list for discussions around Rcpp. If you have ideas or suggested changes, send an email there.

Download

A local archive is available here and at CRAN; SVN access is provided at R-Forge.

License

Rcpp is licensed under the GNU GPL version 2 or later.

Last modified: Sat Feb 16 10:29:47 CST 2013