Mon, 31 Dec 2012

Ragnar Relay Chicago 2012

One things I never quite got around to during 2012 was to blog about the awesome relay we ran in early June. This was the race formerly known as MC200 (for Madiston, WI, to Chicago, IL, by way of Milwaukee, WI, for about 200 miles) and is now part of the Ragnar Relay series: Ragnar Chicago.

We ran as a so-called ultra team of six runners, as opposed to a regular ream of twelve. The course is cut into 36 segments; on regular teams you get 3, we each had twice that. My first leg was a combined 17 miles in what turned out to be pretty blistering heat in mid-to-late afternoon. My fellow team members were awesome in getting me lots of water an ice, and I managed to hold onto a pace of just over 8 min/miles. One of the harder runs I've had. Next was a wonderful run pretty much exactly at midnight under starry skies---about seven or so miles followed by ten more miles the next morning.

We ended up coming third (yay!) beating the next time by about six or seven seconds (!!) over a total time of 25 or 26 hours.

It was hard. It was fun. It was exhilirating. It may also have broken me as I haven't really run much since. So good intentions for 2013: get back into the groove.

/sports/running | permanent link

Sun, 30 Dec 2012

RcppClassicExamples 0.1.1

Yesterday's initial upload of RcppClassicExamples was lacking a versioned Depends: to prevent builds on older versions of R. This has been added in a new upload 0.1.1. We also added a NEWS file (see below); no code changes were made.

Changes in version 0.1.1 (2012-12-30)

  • Added versioned Depends: in DESCRIPTION to not build under older versions of Rcpp and RcppClassic

Changes in version 0.1.0 (2012-12-27)

Thanks to CRANberries, you can also look at a diff to the previous release 0.1.0. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Sat, 29 Dec 2012

RcppExamples 0.1.5 and RcppClassicExamples 0.1.0

The recent releases of Rcpp 0.10.2 and RcppClassic 0.9.3 had one more repercussion. On that dreaded OS, the linker no longer wanted to instantiate a symbol present in both packages; seems to me that the linker in the other two OSs is a little smarter. Anyway -- I didn't fight this but at long last moved all remnands of the long-deprecated older Rcpp API (which is still maintained by package RcppClassic) out of package RcppExamples and into a new package RcppClassicExamples.

And the updated version 0.1.5 of the RcppExamples package appeared on CRAN and has now been joined by the initial version 0.1.0 of the new package RcppClassicExamples. No code changed were made; manual pages and descriptions where brushed up and that is about it.

Thanks to CRANberries, you can also look at a diff to the previous release 0.1.4 of RcppExamples. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Sat, 22 Dec 2012

RcppClassic 0.9.3

Yesterday's release of Rcpp 0.10.2 required a small change to RcppClassic, the package supporting the deprecated older classic Rcpp API defined in the earlier 2005 to 2006 releases. So version 0.9.3 of RcppClassic is now on CRAN. There is no new user-facing code.

Courtesy of CRANberries, there is the set of changes relative to the previous release 0.9.2.

Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Kurt Elling Quintet at the Green Mill

Kurt Elling is back in Chicago for two gigs at the Green Mill (where I last saw him in a wonderful big-band setting), along with his long-time collaborator Laurence Hobgood on piano, Clark Sommers on bass, Chicago's own John McLean on guitar, and the new kid Bryan Carter on drums.

And they were, of course, awesome. Great mix of standards as well as new stuff from the Grammy-nominated new recording. If you can see Kurt Elling live, go. Now. They return for another gig tonight and I will have to see if I can swing that.

/music/jazz/live | permanent link

Patricia Barber at Unity Temple

Forgot to blog about this (though I posted on Google+ about it), but we went to a neat little concert in the neighborhood on October 6. Chicago's own Patricia Barber came to the wonderful Unity Temple by Frank Lloyd Wright (more on the temple at Wikipedia and the foundation).

This closed the 2012 series of the Unity Temple concerts and the setting was a little, well, weird. Look like mostly concert subscribers, with the commensurate age brackets, and not too many jazz folks. Not sure how many Patricia Barber, along with Larry Kohut on bass, won over, but I enjoyed it. I had secured front row (!!) seats in what is already an wonderfully intimate setting for such a duet concert.

See the Google+ link for a YouTube recording by her.

/music/jazz/live | permanent link

Fri, 21 Dec 2012

Rcpp 0.10.2

Relase 0.10.2 of Rcpp provides the second update to the 0.10.* series, and has arrived on CRAN and in Debian.

It brings another great set of enhancements and extensions, building on the recent 0.10.0 and 0.10.1 releases. The new Rcpp attributes were rewritten to not require Rcpp modules (as we encountered on issue with exceptions on Windows when built this way), code was reorganized to significantly accelerate compilation and a couple of new things such as more Rcpp sugar goodies, a new timer class, and a new string class were added. See below for full details.

We also tested this fairly rigorously by checking about two thirds of the over 90 CRAN packages depending on Rcpp (and the remainder required even more package installs which we did not do as this was already taking about 12 total cpu hours to test). We are quite confident that no changes are required (besides one in our own RcppClassic package which we will update.

The complete NEWS entry for 0.10.2 is below; more details are in the ChangeLog file in the package and on the Rcpp Changelog page.

Changes in Rcpp version 0.10.2 (2012-12-21)

  • Changes in Rcpp API:

    • Source and header files were reorganized and consolidated so that compile time are now significantly lower

    • Added additional check in Rstreambuf deletetion

    • Added support for clang++ when using libc++, and for anc icpc in std=c++11 mode, thanks to a patch by Yan Zhou

    • New class Rcpp::String to facilitate working with a single element of a character vector

    • New utility class sugar::IndexHash inspired from Simon Urbanek's fastmatch package

    • Implementation of the equality operator between two Rcomplex

    • RNGScope now has an internal counter that enables it to be safely used multiple times in the same stack frame.

    • New class Rcpp::Timer for benchmarking

  • Changes in Rcpp sugar:

    • More efficient version of match based on IndexHash

    • More efficient version of unique base on IndexHash

    • More efficient version of in base on IndexHash

    • More efficient version of duplicated base on IndexHash

    • More efficient version of self_match base on IndexHash

    • New function collapse that implements paste(., collapse= "" )

  • Changes in Rcpp attributes:

    • Use code generation rather than modules to implement sourceCpp and compileAttributes (eliminates problem with exceptions not being able to cross shared library boundaries on Windows)

    • Exported functions now automatically establish an RNGScope

    • Functions exported by sourceCpp now directly reference the external function pointer rather than rely on dynlib lookup

    • On Windows, Rtools is automatically added to the PATH during sourceCpp compilations

    • Diagnostics are printed to the console if sourceCpp fails and C++ development tools are not installed

    • A warning is printed if when compileAttributes detects Rcpp::depends attributes in source files that are not matched by Depends/LinkingTo entries in the package DESCRIPTION

Thanks to CRANberries, you can also look at a diff to the previous release 0.10.1. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page

/code/rcpp | permanent link

Mon, 17 Dec 2012

R / Finance 2013 Call for Papers

The text below just went out to r-sig-finance along with updates to the R/Finance website and its Call for Papers page.

Call for Papers:

R/Finance 2013: Applied Finance with R
May 17 and 18, 2013
University of Illinois, Chicago, IL, USA

The fifth annual R/Finance conference for applied finance using R will be held on May 17 and 18, 2013 in Chicago, IL, USA at the University of Illinois at Chicago. The conference is expected to cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past four years, R/Finance has included attendees from around the world. It featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2013.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format) although more complete papers are preferred. We welcome submissions for full talks and abbreviated "lightning talks". Both academic and practitioner proposals related to R are encouraged.

Presenters are strongly encouraged to provide working R code to accompany the presentation/paper. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

The conference will award two (or more) $1000 prizes for best papers. A submission must be a full paper to be eligible for a best paper award. Extended abstracts, even if a full paper is provided by conference time, are not eligible for a best paper award. Financial assistance for travel and accommodation may be available to presenters at the discretion of the conference committee. Requests for assistance should be made at the time of submission.

Please send submissions to: committee at RinFinance.com. The submission deadline is February 15, 2013. Submitters will be notified of acceptance via email by February 28, 2013. Notification of whether a presentation will be a long presentation or a lightning talk will also be made at that time.

Additional details will be announced at this website as they become available. Information on previous year's presenters and their presentations are also at the conference website.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

So see you in Chicago in May!

/computers/R | permanent link

RcppArmadillo 0.3.6.1

A first minor bug-fix update by Conrad to the 3.6 series of Armadillo arrived today as version 3.6.1, and we prepared a corresponding version 0.3.6.1 of RcppArmadillo, our wrapper for R and Armadillo. This is now on CRAN, and the changes are summarized below.

Changes in RcppArmadillo version 0.3.6.1 (2012-12-17)

  • Upgraded to Armadillo release Version 3.6.1

    • faster trace()

    • fix for handling sparse matrices by dot()

    • fixes for interactions between sparse and dense matrices

  • Now throws compiler error if Rcpp.h is included before RcppArmadillo.h (as the former is included automatically by the latter anyway, but template logic prefers this ordering).

Courtesy of CRANberries, there is also a diffstat report for the most recent release As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Sat, 08 Dec 2012

Rcpp attributes: Even easier integration of GSL code into R

Following the Rcpp 0.10.0 release, I had written about simulating pi easily by using the wonderful new Rcpp Attributes feature. Now with Rcpp 0.10.1 released a good week ago, it is time to look at how Rcpp Attributes can help with external libraries. As this posts aims to show, it is a breeze!

One key aspect is the use of the plugins for the inline package. They provide something akin to a callback mechanism so that compilation and linking steps can be informed about header and library locations and names. We are going to illustrate this with an example from GNU Scientific Library (GSL). The example I picked uses B-Spline estimation from the GSL. This is a little redundant as R has its own spline routines and package, but serves well as a simple illustration---and, by reproducing an existing example, followed an established path. So we will look at Section 39.7 of the GSL manual which has a complete example as a standalone C program, generating both the data and the fit via cubic B-splines.

We can decompose this two parts: data generation, and fitting. We will provide one function each, and then use both from R. These two function will follow the aforementioned example from Section 39.7 somewhat closely.

We start with the first function to generate the data.

// [[Rcpp::depends(RcppGSL)]]
#include <RcppGSL.h>

#include <gsl/gsl_bspline.h>
#include <gsl/gsl_multifit.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_statistics.h>

const int N = 200;                              // number of data points to fit 
const int NCOEFFS = 12;                         // number of fit coefficients */
const int NBREAK = (NCOEFFS - 2);               // nbreak = ncoeffs + 2 - k = ncoeffs - 2 since k = 4 */

// [[Rcpp::export]]
Rcpp::List genData() {

    const size_t n = N;
    size_t i;
    double dy;
    gsl_rng *r;
    RcppGSL::vector<double> w(n), x(n), y(n);

    gsl_rng_env_setup();
    r = gsl_rng_alloc(gsl_rng_default);

    //printf("#m=0,S=0\n");
    /* this is the data to be fitted */

    for (i = 0; i < n; ++i) {
        double sigma;
        double xi = (15.0 / (N - 1)) * i;
        double yi = cos(xi) * exp(-0.1 * xi);

        sigma = 0.1 * yi;
        dy = gsl_ran_gaussian(r, sigma);
        yi += dy;

        gsl_vector_set(x, i, xi);
        gsl_vector_set(y, i, yi);
        gsl_vector_set(w, i, 1.0 / (sigma * sigma));
                
        //printf("%f %f\n", xi, yi);
    }

    Rcpp::DataFrame res = Rcpp::DataFrame::create(Rcpp::Named("x") = x,
                                                  Rcpp::Named("y") = y,
                                                  Rcpp::Named("w") = w);

    x.free();
    y.free();
    w.free();
    gsl_rng_free(r);

    return(res);
}
We include a few header files, define (in what is common for C programs) a few constants and then define a single function genData() which returns and Rcpp::List as a list object to R. A primary importance here are the two attributes: one to declare a dependence on the RcppGSL package, and one to declare the export of the data generator function. That is all it takes! The plugin of the RcppGSL will provide information about the headers and library, and Rcpp Attributes will do the rest.

The core of the function is fairly self-explanatory, and closely follows the original example. Space gets allocated, the RNG is setup and a simple functional form generates some data plus noise (see below). In the original, the data is written to the standard output; here we return it to R as three columns in a data.frame object familiar to R users. We then free the GSL vectors; this manual step is needed as they are implemented as C vectors which do not have a destructor.

Next, we can turn the fitting function.

// [[Rcpp::export]]
Rcpp::List fitData(Rcpp::DataFrame ds) {

    const size_t ncoeffs = NCOEFFS;
    const size_t nbreak = NBREAK;

    const size_t n = N;
    size_t i, j;

    Rcpp::DataFrame D(ds);              // construct the data.frame object
    RcppGSL::vector<double> y = D["y"]; // access columns by name, 
    RcppGSL::vector<double> x = D["x"]; // assigning to GSL vectors
    RcppGSL::vector<double> w = D["w"];

    gsl_bspline_workspace *bw;
    gsl_vector *B;
    gsl_vector *c; 
    gsl_matrix *X, *cov;
    gsl_multifit_linear_workspace *mw;
    double chisq, Rsq, dof, tss;

    bw = gsl_bspline_alloc(4, nbreak);      // allocate a cubic bspline workspace (k = 4)
    B = gsl_vector_alloc(ncoeffs);

    X = gsl_matrix_alloc(n, ncoeffs);
    c = gsl_vector_alloc(ncoeffs);
    cov = gsl_matrix_alloc(ncoeffs, ncoeffs);
    mw = gsl_multifit_linear_alloc(n, ncoeffs);

    gsl_bspline_knots_uniform(0.0, 15.0, bw);   // use uniform breakpoints on [0, 15] 

    for (i = 0; i < n; ++i) {                   // construct the fit matrix X 
        double xi = gsl_vector_get(x, i);

        gsl_bspline_eval(xi, B, bw);            // compute B_j(xi) for all j 

        for (j = 0; j < ncoeffs; ++j) {         // fill in row i of X 
            double Bj = gsl_vector_get(B, j);
            gsl_matrix_set(X, i, j, Bj);
        }
    }

    gsl_multifit_wlinear(X, w, y, c, cov, &chisq, mw);  // do the fit 
    
    dof = n - ncoeffs;
    tss = gsl_stats_wtss(w->data, 1, y->data, 1, y->size);
    Rsq = 1.0 - chisq / tss;
    
    Rcpp::NumericVector FX(151), FY(151);       // output the smoothed curve 
    double xi, yi, yerr;
    for (xi = 0.0, i=0; xi < 15.0; xi += 0.1, i++) {
        gsl_bspline_eval(xi, B, bw);
        gsl_multifit_linear_est(B, c, cov, &yi, &yerr);
        FX[i] = xi;
        FY[i] = yi;
    }

    Rcpp::List res =
      Rcpp::List::create(Rcpp::Named("X")=FX,
                         Rcpp::Named("Y")=FY,
                         Rcpp::Named("chisqdof")=Rcpp::wrap(chisq/dof),
                         Rcpp::Named("rsq")=Rcpp::wrap(Rsq));

    gsl_bspline_free(bw);
    gsl_vector_free(B);
    gsl_matrix_free(X);
    gsl_vector_free(c);
    gsl_matrix_free(cov);
    gsl_multifit_linear_free(mw);
    
    y.free();
    x.free();
    w.free();

    return(res);   
}

The second function closely follows the second part of the GSL example and, given the input data, fits the output data. Data structures are setup, the spline basis is created, data is fit and then the fit is evaluated at a number of points. These two vectors are returned along with two goodness of fit measures.

We only need to load the Rcpp package and source a file containing the two snippets shown above, and we are ready to deploy this:

library(Rcpp)
sourceCpp("bSpline.cpp")                # compile two functions
dat <- genData()                        # generate the data
fit <- fitData(dat)                     # fit the model, returns matrix and gof measures

And with that, we generate a chart such as

Spline fitting example from GSL manual redone with Rcpp Attributes

via a simple four lines, or as much as it took to create the C++ functions, generate the data and fit it!

op <- par(mar=c(3,3,1,1))
plot(dat[,"x"], dat[,"y"], pch=19, col="#00000044")
lines(fit[[1]], fit[[2]], col="orange", lwd=2)
par(op)

The RcppArmadillo and RcppEigen package support plugin use in the same way. Add an attribute to export a function, and an attribute for the depends -- and you're done. Extending R with (potentially much faster) C++ code has never been easier, and opens a whole new set of doors.

/code/snippets | permanent link

Fri, 07 Dec 2012

RcppArmadillo 0.3.6.0

Conrad launched the 3.6 series of Armadillo earlier today with a first 3.6.0 release. So RcppArmadillo, our wrapper for R and Armadillo, is now on CRAN with its corresponding version 0.3.6.0. No R level or interface changes were needed, and the upstream changes are summarized below.

Changes in RcppArmadillo version 0.3.6.0 (2012-12-07)

  • Upgraded to Armadillo release Version 3.6.0 (Piazza del Duomo)

    • faster handling of compound expressions with submatrices and subcubes

    • added support for loading matrices as text files with NaN and Inf elements

    • added stable_sort_index(), which preserves the relative order of elements with equivalent values

    • added handling of sparse matrices by mean(), var(), norm(), abs(), square(), sqrt()

    • added saving and loading of sparse matrices in arma_binary format

Courtesy of CRANberries, there is also a diffstat report for the most recent release As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Wed, 05 Dec 2012

RInside 0.2.10

The new maintenance release 0.2.10 of RInside is now on CRAN, including Windows binaries. RInside provides a set of convenience classes which facilitate embedding of R inside of C++ applications and programs, using the classes and functions provided by the Rcpp R and C++ integration package.

This release helps with an update to stack checking, required by a recent change in R itself. The NEWS extract below has more details.

Changes in RInside version 0.2.10 (2012-12-05)

  • Adjusted to change in R which requires turning checking of the stack limit off in order to allow for access from multiple threads as in the Wt examples. As there are have been no side-effects, this is enabled by default on all platforms (with the exception of Windows).

  • Added new ‘threads’ example directory with a simple example based on a Boost mutex example.

  • Disabled two examples (passing an external function down) which do not currently work; external pointer use should still work.

CRANberries also provides a short report with changes from the previous release. More information is on the RInside page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.

/code/rinside | permanent link

Sun, 02 Dec 2012

RQuantLib 0.3.9

A minor feature release RQuantLib 0.3.9 is now on CRAN and in Debian. RQuantLib combines (some of) the quantitative analytics of QuantLib with the R statistical computing environment and language.

Bryan Lewis had suggested to enable another pricing engine for American Options in order to get (at least some) Greeks. This is now supported by picking engine="CrankNicolson" as shown in the default example for the AmericanOption function:

R> library(RQuantLib)
R> example(AmericanOption)

AmrcnOR> # simple call with unnamed parameters
AmrcnOR> AmericanOption("call", 100, 100, 0.02, 0.03, 0.5, 0.4)
Concise summary of valuation for AmericanOption 
  value   delta   gamma    vega   theta     rho  divRho 
11.3648      NA      NA      NA      NA      NA      NA 

AmrcnOR> # simple call with some explicit parameters
AmrcnOR> AmericanOption("put", strike=100, volatility=0.4, 100, 0.02, 0.03, 0.5)
Concise summary of valuation for AmericanOption 
  value   delta   gamma    vega   theta     rho  divRho 
10.9174      NA      NA      NA      NA      NA      NA 

AmrcnOR> # simple call with unnamed parameters, using Crank-Nicolons
AmrcnOR> AmericanOption("put", strike=100, volatility=0.4, 100, 0.02, 0.03, 0.5, engine="CrankNicolson")
Concise summary of valuation for AmericanOption 
  value   delta   gamma    vega   theta     rho  divRho 
10.9173 -0.4358  0.0140      NA      NA      NA      NA 
R> 

Thanks to CRANberries, there is also a diff to the previous release 0.3.8. Full changelog details, examples and more details about this package are at my RQuantLib page.

/code/rquantlib | permanent link

Tue, 27 Nov 2012

Rcpp 0.10.1

A the new Rcpp release 0.10.1 arrived this morning on CRAN (as already has Windows binaries) and in Debian.

This is a follow-up to the recent 0.10.0 release which extends the exciting new Rcpp-attributes and Rcpp-sugar work further, and as in a number of other areas as detailed below in the NEWS sections.

This release brings an change to some of the binary interfaces. If you have packages using Rcpp, you will most likely have to reinstall them from source. Some change were made to const correctness as well as other aspects, and it seems that we have temporarily broken the excellent RcppEigen and RcppOctave packages. We are looking into this, and are sorry about the bug.

The complete NEWS entry for 0.10.1 is below; more details are in the ChangeLog file in the package and on the Rcpp Changelog page.

Changes in Rcpp version 0.10.1 (2012-11-26)

  • Changes in Rcpp sugar:

    • New functions: setdiff, union_, intersect setequal, in, min, max, range, match, table, duplicated

    • New function: clamp which combines pmin and pmax, e.g. clamp( a, x, b) is the same as pmax( b, pmin(x, a) )

    • New function: self_match which implements something similar to match( x, unique( x ) )

  • Changes in Rcpp API:

    • The Vector template class (hence NumericVector ...) get the is_na and the get_na static methods.

    • New helper class no_init that can be used to create a vector without initializing its data, e.g. : IntegerVector out = no_init(n) ;

    • New exception constructor requiring only a message; stop function to throw an exception

    • DataFrame gains a nrows method

  • Changes in Rcpp attributes:

    • Ability to embed R code chunks (via specially formatted block comments) in C++ source files.

    • Allow specification of argument defaults for exported functions.

    • New scheme for more flexible mixing of generated and user composed C++ headers.

    • Print warning if no export attributes are found in source file.

    • Updated vignette with additional documentation on exposing C++ interfaces from packages and signaling errors.

  • Changes in Rcpp modules:

    • Enclose .External invocations in BEGIN_RCPP/END_RCPP

  • Changes in R code :

    • New function areMacrosDefined

    • Additions to Rcpp.package.skeleton:

      • attributes parameter to generate a version of rcpp_hello_world that uses Rcpp::export.

      • cpp_files parameter to provide a list of C++ files to include the in the src directory of the package.

  • Miscellaneous changes:

    • New example 'pi simulation' using R and C++ via Rcpp attributes

Thanks to CRANberries, you can also look at a diff to the previous release 0.10.0. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page

/code/rcpp | permanent link

Sun, 25 Nov 2012

digest 0.6.0

A new version of the digest package (which generates hash function summaries for arbitrary (and possibly nested) R objects using any of the standard md5, sha-1, sha-256, sha-512 or crc32 algorithms) is now on CRAN, and I will push the corresponding version into Debian in a moment.

For this release, Hannes Muehleisen added support for sha-512 using an older standalone function by Aaron D. Gifford which I had to whip into slightly more portable shape to work on Windows. (Hint: uint32_t from stdint.h, not u_int32_t)

CRANberries provides the usual summary of changes to version 0.5.2.

As usual, our package is available via the R-Forge page leading to svn and tarball access, my digest page and the local directory here.

/code/digest | permanent link

Tue, 20 Nov 2012

Rcpp attributes: A simple example 'making pi'

We introduced Rcpp 0.10.0 with a number of very nice new features a few days ago, and the activity on the rcpp-devel mailing list has been pretty responsive which is awesome.

But because few things beat a nice example, this post tries to build some more excitement. We will illustrate how Rcpp attributes makes it really easy to add C++ code to R session, and that that code is as easy to grasp as R code.

Our motivating example is everybody's favourite introduction to Monte Carlo simulation: estimating π. A common method uses the fact the unit circle has a surface area equal to π. We draw two uniform random numbers x and y, each between zero and one. We then check for the distance of the corresponding point (x,y) relative to the origin. If less than one (or equal), it is in the circle (or on it); if more than one it is outside. As the first quadrant is a quarter of a square of area one, the area of the whole circle is π -- so our first quadrant approximates π over four. The following figure, kindly borrowed from Wikipedia with full attribution and credit, illustrates this:

Example of simulating pi

Now, a vectorized version (drawing N such pairs at once) of this approach is provided by the following R function.

piR <- function(N) {
    x <- runif(N)
    y <- runif(N)
    d <- sqrt(x^2 + y^2)
    return(4 * sum(d < 1.0) / N)
}

And in C++ we can write almost exactly the same function thanks the Rcpp sugar vectorisation available via Rcpp:

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
double piSugar(const int N) {
    RNGScope scope;		// ensure RNG gets set/reset
    NumericVector x = runif(N);
    NumericVector y = runif(N);
    NumericVector d = sqrt(x*x + y*y);
    return 4.0 * sum(d < 1.0) / N;
}
Sure, there are small differences: C++ is statically typed, R is not. We need one include file for declaration, and we need one instantiation of the RNGScope object to ensure random number draws remain coordinated between the calling R process and the C++ function calling into its (compiled C-code based) random number generators. That way we even get the exact same draws for the same seed. But the basic approach is identical: draw a vector x and vector y, compute the distance to the origin and then obtain the proportion within the unit circle -- which we scale by four. Same idea, same vectorised implementation in C++.

But the real key here is the one short line with the [[Rcpp::export]] attribute. This is all it takes (along with sourceCpp() from Rcpp 0.10.0) to get the C++ code into R.

The full example (which assumes the C++ file is saved as piSugar.cpp in the same directory) is now:

#!/usr/bin/r

library(Rcpp)
library(rbenchmark)

piR <- function(N) {
    x <- runif(N)
    y <- runif(N)
    d <- sqrt(x^2 + y^2)
    return(4 * sum(d < 1.0) / N)
}

sourceCpp("piSugar.cpp")

N <- 1e6

set.seed(42)
resR <- piR(N)

set.seed(42)
resCpp <- piSugar(N)

## important: check results are identical with RNG seeded
stopifnot(identical(resR, resCpp))

res <- benchmark(piR(N), piSugar(N), order="relative")

print(res[,1:4])

and it does a few things: set up the R function, source the C++ function (and presto: we have a callable C++ function just like that), compute two simulations given the same seed and ensure they are in fact identical -- and proceed to compare the timing in a benchmarking exercise. That last aspect is not even that important -- we end up being almost-but-not-quite twice as fast on my machine for different values of N.

The real takeaway here is the ease with which we can get a C++ function into R --- and the new process completely takes care of passing parameters in, results out, and does the compilation, linking and loading.

More details about Rcpp attributes are in the new vignette. Now enjoy the π.

Update:One somewhat bad typo fixed.

Update:Corrected one background tag.

/code/snippets | permanent link

Fri, 16 Nov 2012

RcppArmadillo 0.3.4.4

A minor bug-fix release 3.4.4 of Armadillo came out upstream a few days ago. RcppArmadillo, our wrapper for R and Armadillo, is now on CRAN with its corresponding version 0.3.4.4. No R level or interface changes were made and the upstream changes are summarized below.

Changes in RcppArmadillo version 0.3.4.4 (2012-11-15)

  • Upgraded to Armadillo release 3.4.4

    • fix for handling complex numbers by sparse matrices

    • fix for minor memory leak by sparse matrices

Courtesy of CRANberries, there is also a diffstat report for 0.3.4.4 relative to 0.3.4.3 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Wed, 14 Nov 2012

Rcpp and the new R:: namespace for Rmath.h

We released Rcpp 0.10.0 earlier today. This post will just provide a simple example for one of the smaller new features -- the new namespace for functions from Rmath.h -- and illustrate one of the key features (Rcpp attributes) in passing.

R, as a statistical language and environment, has very well written and tested statistical distribution functions providing probability density, cumulative distribution, quantiles and random number draws for dozens of common and not so common distribution functions. This code is used inside R, and available for use from standalone C or C++ programs via the standalone R math library which Debian / Ubuntu have as a package r-mathlib (and which can be built from R sources).

User sometimes write code against this interface, and then want to combine the code with other code, possibly even with Rcpp. We allowed for this, but it required a bit of an ugly interface. R provides a C interface; these have no namespaces. Identifiers can clash, and to be safe one can enable a generic prefix Rf_. So functions which could clash such as length or error become Rf_length and Rf_error and are less likely to conflict with symbols from other libraries. Unfortunately, the side-effect is that calling, say, the probability distribution function for the Normal distribution becomes Rf_pnorm5() (with the 5 denoting the five parameters: quantile, mean, std.deviation, lowerTail, logValue). Not pretty, and not obvious.

So one of the things we added was another layer of indirection by adding a namespace R with a bunch of inline'd wrapper functions (as well as several handful of unit tests to make sure we avoided typos and argument transposition and what not).

The short example below shows this for a simple function taking a vector, and returning its pnorm computed three different ways:

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::DataFrame mypnorm(Rcpp::NumericVector x) {
    int n = x.size();
    Rcpp::NumericVector y1(n), y2(n), y3(n);

    for (int i=0; i<n; i++) {

        // the way we used to do this
        y1[i] = ::Rf_pnorm5(x[i], 0.0, 1.0, 1, 0);

        // the way we can do it now
        y2[i] = R::pnorm(x[i], 0.0, 1.0, 1, 0);

    }
    // or using Rcpp sugar in one go
    y3 = Rcpp::pnorm(x);

    return Rcpp::DataFrame::create(Rcpp::Named("Rold")  = y1,
                                   Rcpp::Named("Rnew")  = y2,
                                   Rcpp::Named("sugar") = y3);
}

This example also uses the new Rcpp attributes described briefly in the announcement blog post and of course in more detail in the corresponding vignette. Let us just state here that we simply provide a complete C++ function, using standard Rcpp types -- along with one 'attribute' declaration of an export via Rcpp. That's it -- even easier than using inline.

Now in R we simply do

R> sourceCpp("mypnorm.cpp")
to obtain a callable R function with the C++ code just shown behind it. No Makefile, no command-line tool invocation -- nothing but a single call to sourceCpp() which takes care of things --- and brings us a compiled C++ function to R just given the source file with its attribute declaration.

We can now use the new function to compute the probaility distribution both the old way, the new way with the 'cleaner' R::pnorm(), and of course the Rcpp sugar way in a single call. We build a data frame in C++, and assert that all three variants are the same:

R> x <- seq(0, 1, length=1e3)
R> res <- mypnorm(x)
R> head(res)
      Rold     Rnew    sugar
1 0.500000 0.500000 0.500000
2 0.500399 0.500399 0.500399
3 0.500799 0.500799 0.500799
4 0.501198 0.501198 0.501198
5 0.501597 0.501597 0.501597
6 0.501997 0.501997 0.501997
R> all.equal(res[,1], res[,2], res[,3])
[1] TRUE
R> 
This example hopefully helped to illustrate how Rcpp 0.10.0 brings both something really powerful (Rcpp attributes -- more on this another time, hopefully) and convenient in the new namespace for statistical functions.

/code/snippets | permanent link

Rcpp 0.10.0

Rcpp release 0.10.0 is now on CRAN and being uploaded to Debian.

This is a new feature release, and we are very exciting about the changes, notably Rcpp attributes which make using C++ from R even easier than inline (see below as well as the new vignette for details and first examples), the extensions to Rcpp modules (see below) and more as for example new Rcpp sugar functions, a new error output device syncing to R, and a new namespace R> for the statistical functions from Rmath.h.

The complete NEWS entry for 0.10.0 is below; more details are in the ChangeLog file in the package and on the Rcpp Changelog page.

Changes in Rcpp version 0.10.0 (2012-11-13)

  • Support for C++11 style attributes (embedded in comments) to enable use of C++ within interactive sessions and to automatically generate module declarations for packages:

    • Rcpp::export attribute to export a C++ function to R

    • sourceCpp() function to source exported functions from a file

    • cppFunction() and evalCpp() functions for inline declarations and execution

    • compileAttribtes() function to generate Rcpp modules from exported functions within a package

    • Rcpp::depends attribute for specifying additional build dependencies for sourceCpp()

    • Rcpp::interfaces attribute to specify the external bindings compileAttributes() should generate (defaults to R-only but a C++ include file using R_GetCCallable can also be generated)

    • New vignette "Rcpp-attribute"

  • Rcpp modules feature set has been expanded:

    • Functions and methods can now return objects from classes that are exposed through modules. This uses the make_new_object template internally. This feature requires that some class traits are declared to indicate Rcpp's wrap/as system that these classes are covered by modules. The macro RCPP_EXPOSED_CLASS and RCPP_EXPOSED_CLASS_NODECL can be used to declared these type traits.

    • Classes exposed through modules can also be used as parameters of exposed functions or methods.

    • Exposed classes can declare factories with ".factory". A factory is a c++ function that returns a pointer to the target class. It is assumed that these objects are allocated with new on the factory. On the R side, factories are called just like other constructors, with the "new" function. This feature allows an alternative way to construct objects.

    • "converter" can be used to declare a way to convert an object of a type to another type. This gets translated to the appropriate "as" method on the R side.

    • Inheritance. A class can now declare that it inherits from another class with the .derives<Parent>( "Parent" ) notation. As a result the exposed class gains methods and properties (fields) from its parent class.

  • New sugar functions:

    • which_min implements which.min. Traversing the sugar expression and returning the index of the first time the minimum value is found.

    • which_max idem

    • unique uses unordered_set to find unique values. In particular, the version for CharacterVector is found to be more efficient than R's version

    • sort_unique calculates unique values and then sorts them.

  • Improvements to output facilities:

    • Implemented sync() so that flushing output streams works

    • Added Rcerr output stream (forwarding to REprintf)

  • Provide a namespace 'R' for the standalone Rmath library so that Rcpp users can access those functions too; also added unit tests

  • Development releases sets variable RunAllRcppTests to yes to run all tests (unless it was alredy set to 'no'); CRAN releases do not and still require setting – which helps with the desired CRAN default of less testing at the CRAN server farm.

Thanks to CRANberries, you can also look at a diff to the previous release 0.9.15. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page

Update: One link corrected.

/code/rcpp | permanent link

Mon, 05 Nov 2012

RInside 0.2.9

A new version 0.2.9 of RInside arrived on CRAN earlier today; Windows binaries have already been built too. RInside provides a set of convenience classes which facilitate embedding of R inside of C++ applications and programs, using the classes and functions provided by the Rcpp R and C++ integration package.

This release adds a few new features as detailed in the extract from the NEWS below.

A key new feature is the added support for resilience to bad user input based on discussions and an initial (but altered) patch by Theodore Lytras. There are a few ways that this can be deployedon so we also added two more example programs detailing it. The featured Qt example and the Wt example have been updated to use this too.

Another feature that may be quite useful for some is the additonal attempt to find a value for R_HOME if none has been set. The value is typically found at compile time of the RInside package which poses a problem for those using the Windows build -- which is shipped as a binary reflecting the value of the build system. One alternative has always been to build the package locally too to get the local value, or to set it explicitly. But because the error behaviour – a cryptic message of Cannot open base package – is confusing to many, we now try to call the R function which gets this value from the registry. This may need more tweaking and testing, and if you use RInside on the Windows platform I would appreciate feedback.

The complete list of changes since the last release are summarized below in the corresponding NEWS file entry:

Changes in RInside version 0.2.9 (2012-11-04)

  • Applied (modified) patch by Theodore Lytras which lets RInside recover from some parsing errors and makes RInside applications more tolerant of errors

  • Added non-throwing variants of parseEval() and parseEvalQ()

  • Modified Qt and Wt examples of density estimation applications to be much more resilient to bad user input

  • On Windows, have RInside use R's get_R_HOME() function to get R_HOME value from registry if not set by user

  • Added note to examples/standard/Makefile.win that R_HOME may need to be set to run the executables – so either export your local value, or re-install RInside from source to have it reflected in the library build of libRinside

  • Updated CMake build support for standard, armadillo and eigen

  • Improved CMake builds of examples/standard, examples/eigen and examples/armadillo by detecting architecture

CRANberries also provides a short report with changes from the previous release. More information is on the RInside page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.

/code/rinside | permanent link

Thu, 25 Oct 2012

Accelerating R code: Computing Implied Volatilities Orders of Magnitude Faster

This blog, together with Romain's, is one of the main homes of stories about how Rcpp can help with getting code to run faster in the context of the R system for statistical programming and analysis. By making it easier to get already existing C or C++ code to R, or equally to extend R with new C++ code, Rcpp can help in getting stuff done. And it is often fairly straightforward to do so.

In this context, I have a nice new example. And for once, it is work-related. I generally cannot share too much of what we do there as this is, well, proprietary, but I have this nice new example. The other day, I was constructing (large) time series of implied volatilities. Implied volatilities can be thought of as the complement to an option's price: given a price (and all other observables which can be thought of as fixed), we compute an implied volatility price (typically via the standard Black-Scholes model). Given a changed implied volatility, we infer a new price -- see this Wikipedia page for more details. In essence, it opens the door to all sorts of arbitrage and relative value pricing adventures.

Now, we observe prices fairly frequently to create somewhat sizeable time series of option prices. And each price corresponds to one matching implied volatility, and for each such price we have to solve a small and straightforward optimization problem: to compute the implied volatility given the price. This is usually done with an iterative root finder.

The problem comes from the fact that we have to do this (i) over and over and over for large data sets, and (ii) that there are a number of callbacks from the (generic) solver to the (standard) option pricer.

So our first approach was to just call the corresponding function GBSVolatility from the fOption package from the trusted Rmetrics project by Diethelm Wuertz et al. This worked fine, but even with the usual tricks of splitting over multiple cores/machines, it simply took too long for the resolution and data amount we desired. One of the problems is that this function (which uses the proper uniroot optimizer in R) is not inefficient per se, but simply makes to many function call back to the option pricer as can be seen from a quick glance at the code. The helper function .fGBSVolatility gets called time and time again:

R> GBSVolatility
function (price, TypeFlag = c("c", "p"), S, X, Time, r, b, tol = .Machine$double.eps, 
    maxiter = 10000) 
{
    TypeFlag = TypeFlag[1]
    volatility = uniroot(.fGBSVolatility, interval = c(-10, 10), 
        price = price, TypeFlag = TypeFlag, S = S, X = X, Time = Time, 
        r = r, b = b, tol = tol, maxiter = maxiter)$root
    volatility
}
<environment: namespace:fOptions>
R> 
R> .fGBSVolatility
function (x, price, TypeFlag, S, X, Time, r, b, ...) 
{
    GBS = GBSOption(TypeFlag = TypeFlag, S = S, X = X, Time = Time, 
        r = r, b = b, sigma = x)@price
    price - GBS
}
<environment: namespace:fOptions>

So the next idea was to try the corresponding function from my RQuantLib package which brings (parts of) QuantLib to R. That was seen as been lots faster already. Now, QuantLib is pretty big and so is RQuantLib, and we felt it may not make sense to install it on a number of machines just for this simple problem. So one evening this week I noodled around for an hour or two and combined (i) a basic Black/Scholes calculation and (ii) a standard univariate zero finder (both of which can be found or described in numerous places) to minimize the difference between the observed price and the price given an implied volatility. With about one hundred lines in C++, I had something which felt fast enough. So today I hooked this into R via a two-line wrapper in quickly-created package using Rcpp.

I had one more advantage here. For our time series problem, the majority of the parameters (strike, time to maturity, rate, ...) are fixed, so we can structure the problem to be vectorised right from the start. I cannot share the code or more the details of my new implementation. However, both GBSVolatility and EuropeanOprionImpliedVolatility are on CRAN (and as I happen to maintain these for Debian, also just one sudo apt-get install r-cran-foptions r-cran-rquantlib away if you're on Debian or Ubuntu). And writing the other solver is really not that involved.

Anyway, here is the result, courtesy of a quick run via the rbenchmark package. We create a vector of length 500; the implied volatility computation will be performed at each point (and yes, our time series are much longer indeed). This is replicated 100 times (as is the default for rbenchmark) for each of the three approaches:

xyz@xxxxxxxx:~$ r xxxxR/packages/xxxxOptions/demo/timing.R
    test replications elapsed  relative user.self sys.self user.child sys.child
3 zzz(X)          100   0.038     1.000     0.040    0.000          0         0
2 RQL(X)          100   3.657    96.237     3.596    0.060          0         0
1 fOp(X)          100 448.060 11791.053   446.644    1.436          0         0
xyz@xxxxxxxx:~$ 
The new local solution is denoted by zzz(X). It is already orders of magnitude faster than the RQL(x) function using RQuantLib (which is, I presume, due to my custom solution internalising the loop). And the new approach is a laughable amount faster than the basic approach (shown as fOp) via fOptions. For one hundred replications of solving implied volatilities for all elements of a vector of size 500, the slow solution takes about 7.5 minutes --- while the fast solution takes 38 milliseconds. Which comes to a relative gain of over 11,000.

So sitting down with your C++ compiler to craft a quick one-hundred lines, combining two well-known and tested methods, can reap sizeable benefits. And Rcpp makes it trivial to call this from R.

/code/snippets | permanent link

Sun, 14 Oct 2012

Rcpp 0.9.15

Rcpp release 0.9.15 is now on CRAN and being uploaded to Debian.

Martin Morgan provided a clever fix for a header search needed between clang++ (especially on OS X) and g++ (which still provided libstdc++ and headers for clang++). This should hopefully put the clang issues to bed. Ben North noticed an unprotected string conversion when exception messages are turned into R errors which got fixed, and I expanded the coverage of Date (and Datetime) types to deal properly with non-finite values NA, NaN and Inf.

The complete NEWS entry for 0.9.15 is below; more details are in the ChangeLog file in the package and on the Rcpp Changelog page.

Changes in Rcpp version 0.9.15 (2012-10-13)

  • Untangling the clang++ build issue about the location of the exceptions header by directly checking for the include file – an approach provided by Martin Morgan in a kindly contributed patch as unit tests for them.

  • The Date and Datetime types now correctly handles NA, NaN and Inf representation; the Date type switched to an internal representation via double

  • Added Date and Datetime unit tests for the new features

  • An additional PROTECT was added for parsing exception messages before returning them to R, following a report by Ben North

Thanks to CRANberries, you can also look at a diff to the previous release 0.9.14. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page

/code/rcpp | permanent link

Thu, 04 Oct 2012

RcppArmadillo 0.3.4.3

Another bug-fix release of Armadillo, now at version 3.4.3 whike the 3.4.* stabilizes, and with it a version 0.3.4.3 of RcppArmadillo, our wrapper for R and Armadillo. The new version is already on CRAN as of earlier today. Once again no R level or interface changes were, the upstream changes are summarized below.

Changes in RcppArmadillo version 0.3.4.3 (2012-10-04)

  • Upgraded to Armadillo release 3.4.3

    • fix for aliasing issue in diagmat()

    • fix for speye() signature

Courtesy of CRANberries, there is also a diffstat report for 0.3.4.3 relative to 0.3.4.2 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

RProtoBuf 0.2.6

Release 0.2.6 of RProtoBuf arrived on CRAN earlier this morning. RProtoBuf provides GNU R bindings for the Google Protobuf data encoding library used and released by Google.

This release was once more driven largely by Murray whom we have now added among the list of authors of the package too. The NEWS file entry follows below:

Changes in version 0.2.6 (2012-10-04)

  • Applied several more patches by Murray to

    • correct '_' and '__' mismatches in wrapper calls

    • update a few manual pages for style, and add examples

    • fix a bug where NAs were silently treated as TRUE for logical/bool types

    • fix a bug that caused crashes when adding vectors to optional fields

    • fix bugs in readASCII that returned empty protocol buffers when the file or connection could not be opened

    • distinguish between non-existant and not-set fieldswith has() by returning NULL in the former case.

    • fix a bug that caused non-deterministic behavior when setting a repeated message field in a protobuf to a single Message.

    • add unit tests for all of the above.

  • Added Murray to Authors: field in DESCRIPTION

  • Removed old and unconvincing example on RProtoBuf for storage and serialization in an imagined HighFrequencyFinance context

CRANberries also provides a diff to the previous release 0.2.5. More information is at the RProtoBuf page which has a draft package vignette, a 'quick' overview vignette and a unit test summary vignette. Questions, comments etc should go to the rprotobuf mailing list off the RProtoBuf page at R-Forge.

/code/rprotobuf | permanent link

Mon, 01 Oct 2012

Rcpp 0.9.14

Another release of Rcpp has just appeared on CRAN and was just uploaded to Debian.

It addresses yet another issue we had on OS X and should hopefully put the build issues to rest. Three new (vectorized) sugar functions were added, along with some new regression tests and more. The complete NEWS entry for 0.9.14 is below; more details are in the ChangeLog file in the package and on the Rcpp Changelog page.

Changes in Rcpp version 0.9.14 (2012-09-30)

  • Added new Rcpp sugar functions trunc(), round() and signif(), as well as unit tests for them

  • Be more conservative about where we support clang++ and the inclusion of exception_defines.h and prevent this from being attempted on OS X where it failed for clang 3.1

  • Corrected a typo in Module.h which now again permits use of finalizers

  • Small correction for (unexported) bib() function (which provides a path to the bibtex file that ships with Rcpp)

  • Converted NEWS to NEWS.Rd

Thanks to CRANberries, you can also look at a diff to the previous release 0.9.13. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page

/code/rcpp | permanent link

Tue, 25 Sep 2012

RcppArmadillo 0.3.4.2

The development of Armadillo 3.4.* continues with bug fixes and more sparse matrix support. Conrad release 3.4.2 this morning. I wrapped up the corresponding RcppArmadillo 0.3.4.2 before leaving for work, and this version should now have all CRAN mirrors. Once again no R level or interface changes were, the upstream changes are summarized below.

Changes in RcppArmadillo version 0.3.4.2 (2012-09-25)

  • Upgraded to Armadillo release 3.4.2

    • minor fixes for handling sparse submatrix views

    • minor speedups for sparse matrices

Courtesy of CRANberries, there is also a diffstat report for 0.3.4.2 relative to 0.3.4.1 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Tue, 18 Sep 2012

RcppArmadillo 0.3.4.1

Conrad released the bug-fix release 3.4.1 of Armadillo earlier today, and the corresponding RcppArmadillo package 0.3.4.1 is already on CRAN. No R level or interface changes were, the upstream changes are summarized below.

Changes in RcppArmadillo version 0.3.4.1 (2012-09-18)

  • Upgraded to Armadillo release 3.4.1

    • workaround for a bug in the Mac OS X accelerate framework

    • fixes for handling empty sparse matrices

    • added documentation for saving & loading matrices in HDF5 format

    • faster dot() and cdot() for complex numbers

Courtesy of CRANberries, there is also a diffstat report for 0.3.4.1 relative to 0.3.4.0 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Sat, 08 Sep 2012

RInside 0.2.8

This morning version 0.2.8 of RInside arrived on the CRAN sites. RInside provides a set of convenience classes which facilitate embedding of R inside of C++ applications and programs, using the classes and functions provided by the Rcpp R and C++ integration package.

This release adds no new features but improves the build process a little and should make use on Windows a little easier. All changes since the last release are summarized below in the NEWS file entry:

Changes in RInside version 0.2.8 (2012-09-07)

  • Added CMake build support for armadillo and eigen examples, once again kindly contributed by Peter Aberline

  • Corrected Windows package build to always generate a 64 bit static library too

  • Updated package build to no longer require configire and configure.win to update the two header file supplying compile-time information; tightened build dependencies on headers in Makevars / Makevars.win

  • Improved examples/standard/Makefile.win by detecting architecture

CRANberries also provides a short report with changes from the previous release. More information is on the RInside page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.

/code/rinside | permanent link

Thu, 06 Sep 2012

RcppArmadillo 0.3.4.0

A new major released of Armadillo came out earlier today. I prepared the corresponding RcppArmadillo package 0.3.4.0 which also arrived on CRAN earlier today. This released contains a few performance improvements, the beginnings of support of sparse matrices and more, see below. We also post the NEWS entry for the beta release which was prepared, but not uploaded to CRAN to minimise the upload frequency there. On the RcppArmadillo side, two enhancements were made for the fastLm() function for faster linear model fits.

Changes in RcppArmadillo version 0.3.4.0 (2012-09-06)

  • Upgraded to Armadillo release 3.4.0 (Ku De Ta)

    • added economical QR decomposition: qr_econ()

    • added .each_col() & .each_row() for vector operations repeated on each column or row

    • added preliminary support for sparse matrices, contributed by Ryan Curtin et al. (Georgia Institute of Technology)

    • faster singular value decomposition via divide-and-conquer algorithm

    • faster .randn()

  • NEWS file converted to Rd format

Changes in RcppArmadillo version 0.3.3.91 (2012-08-30)

  • Upgraded to Armadillo release 3.3.91

    • faster singular value decomposition via "divide and conquer" algorithm

    • added economical QR decomposition: qr_econ()

    • added .each_col() & .each_row() for vector operations repeated on each column or row

    • added preliminary support for sparse matrices, contributed by Ryan Curtin, James Cline and Matthew Amidon (Georgia Institute of Technology)

  • Corrected summary method to deal with the no intercept case when using a formula; also display residual summary() statistics

  • Expanded unit tests for fastLm

Courtesy of CRANberries, there is also a diffstat report for 0.3.4.0 relative to 0.3.2.4 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Sun, 02 Sep 2012

Faster creation of binomial matrices

Scott Chamberlain blogged about faster creation of binomial matrices the other day, and even referred to our RcppArmadillo package as a possible solution (though claiming he didn't get it to work, tst tst -- that is what the rcpp-devel list is here to help with).

The post also fell short of a good aggregated timing comparison for which we love the rbenchmark package. So in order to rectify this, and to see what we can do here with Rcpp, a quick post revisiting the issue.

As preliminaries, we need to load three packages: inline to create compiled code on the fly (which, I should mention, is also used together with Rcpp by the Stan / RStan MCMC sampler which is creating some buzz this week), the compiler package included with R to create byte-compiled code and lastly the aforementioned rbenchmark package to do the timings. We also set row and column dimension, and set them a little higher than the original example to actually have something measurable:

library(inline)
library(compiler)
library(rbenchmark)

n <- 500
k <- 100
The first suggestion was the one by Scott himself. We will wrap this one, and all the following ones, in a function so that all approaches are comparable as being in a function of two dimension arguments:
scott <- function(N, K) {
    mm <- matrix(0, N, K)
    apply(mm, c(1, 2), function(x) sample(c(0, 1), 1))
}
scottComp <- cmpfun(scott)
We also immediatly compute a byte-compiled version (just because we now can) to see if this helps at all with the code. As there are no (explicit !) loops, we do not expect a big pickup. Scott's function works, but sweeps the sample() function across all rows and columns which is probably going to be (relatively) expensive.

Next is the first improvement suggested to Scott which came from Ted Hart.

ted <- function(N, K) {
    matrix(rbinom(N * K, 1, 0.5), ncol = K, nrow = N)
}
This is quite a bit smarter as it vectorises the approach, generating N times K elements at once which are then reshaped into a matrix.

Another suggestion came from David Smith as well as Rafael Maia. We rewrite it slightly to make it a function with two arguments for the desired dimensions:

david <- function(m, n) {
    matrix(sample(0:1, m * n, replace = TRUE), m, n)
}
This is very clever as it uses sample() over zero and one rather than making (expensive) draws from random number generator.

Next we have a version from Luis Apiolaza:

luis <- function(m, n) {
     round(matrix(runif(m * n), m, n))
}
It draws from a random uniform and rounds to zero and one, rather than deploying the binomial.

Then we have the version using RcppArmadillo hinted at by Scott, but with actual arguments and a correction for row/column dimensions. Thanks to inline we can write the C++ code as an R character string; inline takes care of everything and we end up with C++-based solution directly callable from R:

arma <- cxxfunction(signature(ns="integer", ks="integer"), plugin = "RcppArmadillo", body='
   int n = Rcpp::as<int>(ns);
   int k = Rcpp::as<int>(ks);
   return wrap(arma::randu(n, k));
')
This works, and is pretty fast. The only problem is that it answers the wrong question as it returns U(0,1) draws and not binomials. We need to truncate or round. So a corrected version is
armaFloor <- cxxfunction(signature(ns="integer", ks="integer"), plugin = "RcppArmadillo", body='
   int n = Rcpp::as<int>(ns);
   int k = Rcpp::as<int>(ks);
   return wrap(arma::floor(arma::randu(n, k) + 0.5));
')
which uses the the old rounding approximation of adding 1/2 before truncating.

With Armadillo in the picture, we do wonder how Rcpp sugar would do. Rcpp sugar, described in one of the eight vignettes of the Rcpp package, is using template meta-programming to provide R-like expressiveness (aka "syntactic sugar") at the C++ level. In particular, it gives access to R's RNG functions using the exact same RNGs as R making the results directly substitutable (whereas Armadillo uses its own RNG).

sugar <- cxxfunction(signature(ns="integer", ks="integer"), plugin = "Rcpp", body='
   int n = Rcpp::as<int>(ns);
   int k = Rcpp::as<int>(ks);
   Rcpp::RNGScope tmp;
   Rcpp::NumericVector draws = Rcpp::runif(n*k);
   return Rcpp::NumericMatrix(n, k, draws.begin());
')
Here Rcpp::RNGScope deals with setting/resetting the R RNG state. This draws a vector of N time K uniforms similar to Luis' function -- and just like Luis' R function does so without looping -- and then shapes a matrix of dimension N by K from it.

And it does of course have the same problem as the RcppArmadillo approach earlier and we can use the same solution:

sugarFloor <- cxxfunction(signature(ns="integer", ks="integer"), plugin = "Rcpp", body='
   int n = Rcpp::as<int>(ns);
   int k = Rcpp::as<int>(ks);
   Rcpp::RNGScope tmp;
   Rcpp::NumericVector draws = Rcpp::floor(Rcpp::runif(n*k)+0.5);
   return Rcpp::NumericMatrix(n, k, draws.begin());
')

Now that we have all the pieces in place, we can compare:

res <- benchmark(scott(n, k), scottComp(n,k),
                 ted(n, k), david(n, k), luis(n, k),
                 arma(n, k), sugar(n,k),
                 armaFloor(n, k), sugarFloor(n, k),
                 order="relative", replications=100)
print(res[,1:4])
With all the above code example in a small R script we call via littler, we get
edd@max:~/svn/rcpp/pkg$ r /tmp/scott.r 
Loading required package: methods
              test replications elapsed   relative
7      sugar(n, k)          100   0.072   1.000000
9 sugarFloor(n, k)          100   0.088   1.222222
6       arma(n, k)          100   0.126   1.750000
4      david(n, k)          100   0.136   1.888889
8  armaFloor(n, k)          100   0.138   1.916667
3        ted(n, k)          100   0.384   5.333333
5       luis(n, k)          100   0.410   5.694444
1      scott(n, k)          100  33.045 458.958333
2  scottComp(n, k)          100  33.767 468.986111
We can see several takeaways:
  • Rcpp sugar wins, which is something we have seen in previous posts on this blog. One hundred replication take only 72 milliseconds (or 88 in the corrected version) --- less than one millisecond per matrix creation.
  • RcppArmadillo does well too, and I presume that the small difference is due not to code in Armadillo but the fact that we need one more 'mapping' of data types on the way back to R
  • The sample() idea by David and Rafael is very, very fast too. This proves once again that well-written R code can be competitive. It also suggest how to make the C++ solution by foregoing (expensive) RNG draws in favour of sampling
  • The approaches by Ted and Luis are also pretty good. In practice, the are probably good enough.
  • Scott's function is not looking so hot (particularly as we increased the problem dimensions) and byte-compilation does not help at all.
Thanks to Scott and everybody for suggesting this interesting problem. Trying the rbinom() Rcpp sugar function, or implementing sample() at the C++ level is, as the saying goes, left as an exercise to the reader.

/code/snippets | permanent link

Thu, 16 Aug 2012

Follow-up to Counting CRAN Package Depends, Imports and LinkingTo

A few days ago, I blogged about visualizing CRAN dependency ranks which turned out to be a somewhat popular post. David Smith followed-up at the REvo blog suggesting to exclude packages already shipping with R (which is indicated by their 'Recommended' priority). Good idea!

So here is an updated version, where we limit the display to the top twenty packages counted by reverse 'Depends:', and excluding those already shipping with R such as MASS, lattice, survival, Matrix, or nlme.

CRAN package chart of Reverse Depends relations excluding Recommended packages

The mvtnorm package is still out by a wide margin, but we can note that (cough, cough) our Rcpp package for seamless R and C++ is now tied for second with the coda package for MCMC analysis. Also of note is the fact that CRAN keeps growing relentlessly and moved from 3969 packages to 3981 packages in the space of these few days...

Lastly, I have been asked about the code and/or data behind this. It is really pretty simply as the main data.frame can be had from CRAN (where I also found the initial few lines to load it). After that, one only needs a little bit of subsetting as shown below. I look forward to seeing other people riff on this data set.

#!/usr/bin/r
##
## Initial db downloand from http://developer.r-project.org/CRAN/Scripts/depends.R and adapted

require("tools")

## this function is essentially the same as R Core's from the URL
## http://developer.r-project.org/CRAN/Scripts/depends.R
getDB <- function() {
    contrib.url(getOption("repos")["CRAN"], "source") # trigger chooseCRANmirror() if required
    description <- sprintf("%s/web/packages/packages.rds", getOption("repos")["CRAN"])
    con <- if(substring(description, 1L, 7L) == "file://") {
        file(description, "rb")
    } else {
        url(description, "rb")
    }
    on.exit(close(con))
    db <- readRDS(gzcon(con))
    rownames(db) <- db[,"Package"]

    db
}

db <- getDB()

## count packages
getCounts <- function(db, col) {
    foo <- sapply(db[,col],
                  function(s) { if (is.na(s)) NA else length(strsplit(s, ",")[[1]]) } )
}

## build a data.frame with the number of entries for reverse depends, reverse imports,
## reverse linkingto and reverse suggests; also keep Recommended status
ddall <- data.frame(pkg=db[,1],
                    RDepends=getCounts(db, "Reverse depends"),
                    RImports=getCounts(db, "Reverse imports"),
                    RLinkingTo=getCounts(db, "Reverse linking to"),
                    RSuggests=getCounts(db, "Reverse suggests"),
                    Recommended=db[,"Priority"]=="recommended"
                    )

## Subset to non-Recommended packages as in David Smith's follow-up post
dd <- subset(ddall, is.na(ddall[,"Recommended"]) | ddall[,"Recommended"] != TRUE)

labeltxt <- paste("Analysis as of", format(Sys.Date(), "%d %b %Y"),
                  "covering", nrow(db), "total CRAN packages")

cutOff <- 20
doPNG <- TRUE

if (doPNG) png("/tmp/CRAN_ReverseDepends.png", width=600, heigh=600)
z <- dd[head(order(dd[,2], decreasing=TRUE), cutOff),c(1,2)]
dotchart(z[,2], labels=z[,1], cex=1, pch=19,
         main="CRAN Packages sorted by Reverse Depends:",
         sub=paste("Limited to top", cutOff, "packages, excluding 'Recommended' ones shipped with R"),
         xlab=labeltxt)
if (doPNG) dev.off()

if (doPNG) png("/tmp/CRAN_ReverseImports.png", width=600, heigh=600)
z <- dd[head(order(dd[,3], decreasing=TRUE), cutOff),c(1,3)]
dotchart(z[,2], labels=z[,1], cex=1, pch=19,
         main="CRAN Packages sorted by Reverse Imports:",
         sub=paste("Limited to top", cutOff, "packages, excluding 'Recommended' ones shipped with R"),
         xlab=labeltxt)
if (doPNG) dev.off()

# no cutOff but rather a na.omit
if (doPNG) png("/tmp/CRAN_ReverseLinkingTo.png", width=600, heigh=600)
z <- na.omit(dd[head(order(dd[,4], decreasing=TRUE), 30),c(1,4)])
dotchart(z[,2], labels=z[,1], pch=19,
         main="CRAN Packages sorted by Reverse LinkingTo:",
         xlab=labeltxt)
if (doPNG) dev.off()

/code/snippets | permanent link

Mon, 13 Aug 2012

RInside 0.2.7

A new version 0.2.7 of RInside is now available via CRAN. RInside provides a set of convenience classes which facilitate embedding of R inside of C++ applications and programs, using the classes and functions provided by the Rcpp R and C++ integration package.

This release adds two new examples subdirectories demonstrating use of RInside with, respectively, RcppArmadillo and RcppEigen. We extended the 'web application' example using the Wt toolkit by adding CSS and XML support, and added another new example motivated by a StackOverflow question. CMake support has been added for Windows as well thanks to Peter Aberline---he also contributed CMake code for the two new example directories but that contribution made it only into SVN and not this release.

All changes since the last release are summarized below in the NEWS file entry:

Changes in RInside version 0.2.7 (2012-08-12)

  • New fifth examples subdirectory 'armadillo' with two new examples showing how to combine RInside with RcppArmadillo

  • New sixth examples subdirectory 'eigen' with two new examples showing how to combine RInside with RcppEigen

  • Prettified the Wt example 'web application' with CSS use, also added and XML file with simple headers and description text

  • New example rinside_sample12 motivated by StackOverflow question on using sample() from C

  • Added CMake build support on Windows for the examples

CRANberries also provides a short report with changes from the previous release. More information is on the RInside page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.

/code/rinside | permanent link

Fri, 10 Aug 2012

RcppExamples 0.1.4

An updated version 0.1.4 of the RcppExamples package is now on on CRAN. RcppExamples contains a few illustrations of how to use Rcpp.

The NEWS entry is below: a new example was added illustrating use of the (vectorised) random-number generators for three of the different distributions --- and showing how it perfectly reproduces the values one gets in R.

Changes in RcppExamples version 0.1.4 (2012-08-09)

  • Added new example for Rcpp sugar and vectorised draws of RNGs

  • Minor updates to reflect newer CRAN Policy

Thanks to CRANberries, you can also look at a diff to the previous release 0.1.3. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page

/code/rcpp | permanent link

Thu, 09 Aug 2012

RProtoBuf 0.2.5

A new release 0.2.5 of RProtoBuf is now on CRAN. RProtoBuf provides GNU R bindings for the Google Protobuf data encoding library used and released by Google.

This release once again contains a number of patches kindly contributed by Murray Stokely, as well as some updates to conform to CRAN Policy changes.

The NEWS file entry follows below:

Changes in version 0.2.5 (2012-08-08)

  • Applied patches by Murray to

    • correctly deal with nested Protocol Buffer definitions, and also add new unit test for this

    • test a a protocol buffer for missing required fields before serializing it, also add a unit test

    • add a small stylistic fix and examples to the 'add.Rd' manual page

  • Moved inst/doc/ to vignettes/ per newer CRAN Policy

CRANberries also provides a diff to the previous release 0.243. More information is at the RProtoBuf page which has a draft package vignette, a 'quick' overview vignette and a unit test summary vignette. Questions, comments etc should go to the rprotobuf mailing list off the RProtoBuf page at R-Forge.

/code/rprotobuf | permanent link

Wed, 08 Aug 2012

RcppBDT 0.2.1

A new bug-fix release of the RcppBDT package appeared on CRAN earlier today. David Reiner noticed that the functions getEndOfMonth and getEndOfBizWeek were not working right. These are convenience wrappers around the real functionality provided as a member function to the reference class built by Rcpp modules---which works off a reference instance of the class, and these two convenience functions were not updating the date. This is now fixed.

The complete NEWS entry is below:

Changes in version 0.2.1 (2012-08-08)

  • Bug for getEndOfBizWeek() and getEndOfMonth() who were lacking a call to fromDate(date) to actually pass the date for which the functions are computing the end of business week or month.

Courtesy of CRANberries, there is also a diffstat report for 0.2.1 relative to 0.2.0. As always, feedback is welcome and the rcpp-devel mailing list off the R-Forge page for Rcpp is the best place to start a discussion.

/code/rcpp | permanent link

Sun, 05 Aug 2012

Counting CRAN Package Depends, Imports and LinkingTo

The recent update by Søren Højsgaard's to his gRbase package for graphical models made it the 75th package to depend on our Rcpp package for R and C++ integration. So in a lighthearted weekend moment, I tweeted about gRbase being number 75 for Rcpp to which Hadley replied, asking if Rcpp was in fact number one among R package Depends. Far from it, and I immediately replied listing lattice and Matrix as packages with way more other packages depending upon them.

But as the question seemed deserving of a bit more analysis, I spent a few minutes on this and prepared three charts listing package in order of reverse Depends, reverse Imports and reverse LinkingTo.

First off, the reverse Depends:. This is the standard means of declaring a dependence of one package upon another.

CRAN package chart of Reverse Depends relations

Unsurprisingly, the MASS package from the classic Venables and Ripley book comes first, with Deepayan Sarkar's powerful lattice package (also covered in a book) coming second. These are both recommended packages which are commonly distributed with R itself. Next are mvtnorm and survival. Our Rcpp is up there in the top-ten, but not a frontrunner.

With the advent of namespaces a few R releases ago, it became possible to import functions from other packages. So the Imports: statement now provides an alternative to the (older) Depends:. The next chart displays the same relationship for Imports::

CRAN package chart of Reverse Imports relations

Now lattice still leads, but Hadleys's plyr package grabbed the second spot just before MASS and Matrix.

It is interesting to see that the sheer number of Imports: is still not where the Depends: are. On the other hand, we see a number of more recent packages popping up in the second chart. This may reflect more recent coding practices. It will be interesting to see how this stacks up over time when we revisit this chart.

Lastly, we can also look at LinkingTo:, a declaration used to provide a C/C++-level dependency at the source code level. We use this in the Rcpp family to provide automatic resolution of the header files needed to compile against our packages. And unsurprisingly, because packages using Rcpp actually use its API (rather than R functions), the package is a little ahead of others. In the package we find three more packages of the Rcpp family, but only a limited number of other packages as C/C++-level dependencies are still somewhat rare in the R universe. There are also fewer packages overall making use of this mechanism.

CRAN package chart of Reverse LinkingTo relations

One could of course take this one level further and sum up dependencies in a recursive manner, or visualize the relationship differently. But these dotchart graphs provide a first visual description of the magnitude of Depends, Imports and LinkingTo among CRAN packages for R.

/code/snippets | permanent link

Tue, 31 Jul 2012

RcppCNPy 0.2.0

Version 0.2.0 of the recently introduced RcppCNPy package for reading/writing NumPy data in R arrived on CRAN earlier today.

The main change are the added ability to also write gzip-ed npy files, to suppress an automatic transposition as well as the correction of one memory leak on data read.

The NEWS file entry is below:

Changes in version 0.2.0 (2012-07-30)

  • Support for writing of gzip-ed npy files has been added.

  • A new option dotranspose has been added to npyLoad() to support data sets that do not need to be transposed to be used in R.

  • A memory leak in reading files has been corrected.

CRANberries also provides a diffstat report for 0.2.0 relative to 0.1.0. As always, feedback is welcome and the rcpp-devel mailing list off the R-Forge page for Rcpp is the best place to start a discussion.

/code/rcpp | permanent link

Tue, 24 Jul 2012

RcppClassic 0.9.2

Similar to yesterday's post about RcppGSL, we have another pure maintenance release to announce, this time of RcppClassic, the package supporting the deprecated older classic Rcpp API defined in the earlier 2005 to 2006 releases, is now on CRAN. There is no new code, as the only changes were made to accomodate CRAN Policy updates.

Courtesy of CRANberries, here is a link to the changes relative to the previous release 0.9.1.

Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Mon, 23 Jul 2012

RcppGSL 0.2.0

Earlier today, a minor update / maintenance release of RcppGSL---our interface package between R and the GNU GSL using our Rcpp package for seamless R and C++ integration---arrived on on CRAN. It contains a number of minor changes to accomodate changes in the CRAN Policies, as well as a few extension to the main example fastLm, a faster replacement for the standard lm function. The summary methods now provide more information, and we added a number of new regression tests.

The NEWS file entry follows below:

Changes in version 0.2.0 (2012-07-22)
  • summary() for fastLm() now displays more information

  • fastLmPure() now uses same argument order as R's lm.fit()

  • Added more unit tests for fastLm() and related functions

  • Export and document S3 methods in NAMESPACE and manual page as such

  • Vignettes have been moved to the vignettes/ directory

  • Main vignette renamed to RcppGSL-intro.pdf to use a filename different from the package reference manual

  • NEWS file converted to .Rd format

  • inline support function no longer uses assignInNamespace

And courtesy of CRANberries, a summary of the changes to the previous release 0.1.1 is available too.

More information is on the RcppGSL page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Thu, 12 Jul 2012

RcppArmadillo 0.3.2.4

Conrad released version 3.2.4 of Armadillo yesterday. It contains a workaround for g++ 4.7.0 and 4.7.1 which have a regression triggered by the Armadillo codebase for small fixed-sized matrices. The corresponding RcppArmadillo package 0.3.2.4 arrived on CRAN earlier today.

The short NEWS entry follows below.

0.3.2.4  2012-07-11

    o   Upgraded to Armadillo release 3.2.4

          * workaround for a regression (bug) in GCC 4.7.0 and 4.7.1

Courtesy of CRANberries, there is also a diffstat report for 0.3.2.4 relative to 0.3.2.3 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Tue, 10 Jul 2012

Getting numpy data into R -- Take Two

A couple of days ago, I had posted a short Python script to convert numpy files into a simple binary format which R can read quickly. Nice, but still needing an extra file. Shortly thereafter, I found Carl Rogers cnpy library which makes reading and writing numpy files from C++ a breeze, and I quickly wrapped this up into a new package RcppCNPy which was released a few days ago.

This post will show a quick example, also summarized in the short pdf vignette describing the package, and provided as a demo within the package.

R> library(RcppCNPy)
Loading required package: Rcpp
R> library(rbenchmark)
R> 
R> n <- 1e5
R> k <- 50
R> 
R> M <- matrix(seq(1.0, n*k, by=1.0), n, k)
R> 
R> txtfile <- tempfile(fileext=".txt")
R> write.table(M, file=txtfile)
R> 
R> pyfile <- tempfile(fileext=".py")
R> npySave(pyfile, M)
R> 
R> pygzfile <- tempfile(fileext=".py")
R> npySave(pygzfile, M)
R> system(paste("gzip -9", pygzfile))
R> pygzfile <- paste(pygzfile, ".gz", sep="")
R> 
R> 
We first load the new package (as well as the rbenchmark package used for the benchmarking example) into R. We then create a large matrix of 100,000 rows and 50 columns. Not quite big data by any stretch, but large enough for ascii reading to be painfully slow. We also write two npy files and compress the second one.

Next, we use the benchmark function to time the three approaches:

R> res <- benchmark(read.table(txtfile),
+                  npyLoad(pyfile),
+                  npyLoad(pygzfile),
+                  order="relative",
+                  columns=c("test", "replications", "elapsed", "relative"),
+                  replications=10)
R> print(res)
                 test replications elapsed relative
2     npyLoad(pyfile)           10   1.241  1.00000
3   npyLoad(pygzfile)           10   3.098  2.49637
1 read.table(txtfile)           10  96.744 77.95649
R> 
As shown by this example, loading a numpy file directly beats the pants off reading the data from ascii: it is about 78 times faster. Reading a compressed file is somewhat slower as the data stream has to be passed through the uncompressor provide by the zlib library. So instead of reading a binary blob in one go (once the file header has been parsed) we have to operate piecemeal---which is bound to be slower. It does however save in storage space (and users can make this tradeoff between speed and size) and is still orders of magnitude faster than parsing the ascii file. Finally, and not shown here, we unlink the temporary files.

Summing up, this post demonstrated how the RcppCNPy package can be a useful to access data in numpy files (which may even be compressed). Data can also be written from R to be accessed later by numpy.

/code/snippets | permanent link

Sun, 08 Jul 2012

New package RcppCNPy with release 0.1.0 (and 0.0.1 earlier last week)

A few days ago I had blogged about getting NumPy data in R by using a simple converter script. That works fine, but it is a little annoying to have to write an entire file only to read from it again. So I kept looking around for a better solution---and soon found the cnpy library by Carl Rogers which provides simple C++ functions to read and write NumPy files.

Bringing such a C++ library to R is done very easily via Rcpp modules. The resulting package contains a single R file with a single line: loadModule("cnpy", TRUE). And it relies on the following module declarations in the a C++ file:

RCPP_MODULE(cnpy){

    using namespace Rcpp;

    function("npyLoad",                         // name of the identifier at the R level
             &npyLoad,                          // function pointer to helper function defined above
             List::create( Named("filename"),   // function arguments including default value
                           Named("type") = "numeric"),
             "read an npy file into a numeric or integer vector or matrix");

    function("npySave",                         // name of the identifier at the R level
             &npySave,                          // function pointer to helper function defined above
             List::create( Named("filename"),   // function arguments including default value
                           Named("object"), 
                           Named("mode") = "w"),
             "save an R object (vector or matrix of type integer or numeric) to an npy file");

}
which give us at the R prompt
R> library(RcppCNPy)
Loading required package: Rcpp
R> npyLoad
internal C++ function <0x243af70>
    docstring : read an npy file into a numeric or integer vector or matrix
    signature : Rcpp::RObject npyLoad(std::string, std::string)
R> npySave
internal C++ function <0x23033e0>
    docstring : save an R object (vector or matrix of type integer or numeric) to an npy file
    signature : void npySave(std::string, Rcpp::RObject, std::string)
R> 
these two functions (and their docstrings) defined above. That's all! Well there are about one hundred more lines dealing with whether we have integer or numeric data, and whether we use a vector or a matrix. But all in all pretty simple...

So version 0.1.0 of this new package RcppCNPy completes the initial release 0.0.1 from earlier in the week by adding

  • the ability to load compressing NumPy files ending in .npy.gz
  • a simple regression test suite loading some data sets
  • a demo script with a timing example comparing ascii reads to reading npy and compressed npy
  • a short pdf vignette describing the package

The NEWS entry for this release (as well as the initial one) follow:

News for Package RcppCNPy

Changes in version 0.1.0 (2012-07-07)
  • Added automatic use of transpose to automagically account for Fortran-vs-C major storage defaults between Python and R.

  • Support for integer types in dependent on the int64_t type which is available only when the -std=c++0x switch is used at build-time (and CRAN still discourages use of it)

  • Added support for reading gzip'ed files ending in ".npy.gz"

  • Added regression tests in directory tests/

  • Added a vignette describing the package

  • Added a timing benchmark in demo/timings.R

Changes in version 0.0.1 (2012-07-04)
  • Initial version, as a straightforward Rcpp modules wrap around the cpny library by Carl Rogers (on github under a MIT license).

  • At present, npy files can be read and written for vectors and matrices of either numeric or integer type. Note however that matrices are currently transposed because of the default Fortran ordering done by numpy.

I will follow up with a little usage example later.

CRANberries also provides a diffstat report for 0.1.0 relative to 0.0.1. As always, feedback is welcome and the rcpp-devel mailing list off the R-Forge page for Rcpp is the best place to start a discussion.

/code/rcpp | permanent link

Tue, 03 Jul 2012

RcppBDT 0.2.0

A new release of the RcppBDT package appeared on CRAN earlier today.

RcppBDT uses Rcpp, and in particular the nifty Rcpp modules feature of wrapping C++ code for R just by declaring the (class or function) interfaces. It uses this to bring in some useful functions from Boost Date.Time to R so that one can do things like

R> library(RcppBDT)
R> sapply(2012:2016, function(year)
+         format(getNthDayOfWeek(first, Mon, Sep, year)))
[1] "2012-09-03" "2013-09-02" "2014-09-01" "2015-09-07" "2016-09-05"
R> 
to compute the next five Labor Day dates in the US, given the year and the first Monday of September requirement. More examples are e.g. on the earlier blog post announcing version 0.1.0.

Changes are mostly internal. R 2.15.1 brough a better / easier way to load such 'modules' into R, and Rcpp 0.9.13 allows us to use this. So RcppBDT continues to be useful as an example package for Rcpp modules. I also streamlines the interface a little: identifiers are now directly accessible in the package's NAMESPACE rather than just via the an instantiated object.

I also added a NEWS file, using the .Rd format so that we can import the marked-up text:

News for Package RcppBDT

Changes in version 0.2.0 (2012-07-02)
  • The core module, which wraps what in C++ is boost::gregorian::date, is now exposed as an Rcpp module bdtDate. As all example and demos operated off a (package-global) variable 'bdt', no user visible change was needed outside of the code instantiating at package load.

  • Updated package instantiation to the new facilities offered by the current versions R 2.15.1 and Rcpp 0.9.13 which make Rcpp module initialization easier and more streamlined.

Changes in version 0.1.0 (2011-01-17)
  • First CRAN upload (and see ChangeLog for more granular details) bug fix in svm cross validation

Courtesy of CRANberries, there is also a diffstat report for 0.2.0 relative to 0.1.0. As always, feedback is welcome and the rcpp-devel mailing list off the R-Forge page for Rcpp is the best place to start a discussion.

/code/rcpp | permanent link

Mon, 02 Jul 2012

RcppArmadillo 0.3.2.3

Conrad releaser version 3.2.3 of Armadillo a few days ago, and the corresponding RcppArmadillo package 0.3.2.3 is now CRAN. (For these keeping score 3.2.1 never was a full release, and 3.2.2 containing fixes for a build issue that did not affect the R package build so we skipped it.)

The short NEWS entry follows below. This version fixes some issues related to g++ 4.7; however the current Debian testing version of g++-4.7.1 required that I rolled back three header files to the version from the previous release. Conrad has been in contact with the gcc upstream maintainers and a fix may appear in time for g++-4.7.2.

This release also contains a new introductory pdf vignette based on a paper Conrad and I just submitted. It introduces Armadillo to R programmers and demonstrates with a simple Kalman filtering example how the code can be written in the same concise matrix-oriented style, yet runs orders of magnitude faster.

0.3.2.3  2012-07-01

    o   Upgraded to Armadillo release 3.2.3 

          * minor correction for declaration of fixed size vectors and
            matrices

    o   Reverted three header files {Mat,Row,Col}_bones.hpp back to previous
        release due to compilation failures under g++-4.7

    o   Added new vignette 'RcppArmadillo-intro' based on a just-submitted
        introductory paper (by Eddelbuettel and Sanderson) about RcppArmadillo 

    o   Change from release 3.2.2 which we skipped as it did not really affect
        builds under R:

          * minor fix for compiling without debugging enabled (aka release
            mode)
          * better detection of ATLAS during installation on Fedora and Red
            Hat systems

    o   Small enhancement to fastLm 

Courtesy of CRANberries, there is also a diffstat report for 0.3.2.3 relative to 0.3.2.0 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Sat, 30 Jun 2012

Getting numpy data into R

The other day, I found myself confronted with a large number of large files. Which were presented in (gzip-)compressed ascii format---which R reads directly via gzfile() connections---as well as (compressed) numpy files.

The numpy can be read very efficiently into Python. We can do the same in R via save() and load(), of course. But the trouble is that you need to read them first. And reading hundreds of megabytes from ascii is slow, no matter which language you use. Concerning R, I poked aound scan(), played with the colClasses argument and looked at the recent LaF package written just for this purpose. And all these solutions were still orders of magnitude slower than reading numpy. Which is no surprise as it is really hard to beat binary formats when you have to parse countless ascii tokens.

So the obvious next idea was to read the numpy file in Python, and to write a simple binary format. One helpful feature with this data set was that it contained only regular (rectangular) matrices of floats. So we could just store two integers for the dimensions, followed by the total data in either one large binary blob, or a sequence of column vectors.

But one minor trouble was that the Intertubes lead to no easy solution to unpack the numpy format. StackOverflow had plenty of question around this topic converned with, say, how to serialize in language-independent way. But no converters. And nobody local knew how to undo the "pickle" format underlying numpy.

But a remote friend did: Laurent, well-known for his Rpy2 package, pointed me towards using the struct module and steered me towards the solution shown below. So a shameless plug: if you need a very experienced Python or R consultant for sciece work, consider his consulting firm.

Finally, to round out this post, let's show the simple solution we crafted so that the next guy searching the Intertubes will have an easier. Let us start with a minimal Python program writing numpy data to disk:

#!/usr/bin/env python
#
# simple example for creating numpy data to demonstrate converter

import numpy as np

# simple float array
a = np.arange(15).reshape(3,5) * 1.1

outfile = "/tmp/data.npy"
np.save(outfile, a)

Next, the simple Python converter to create a binary file containing two integers for row and column dimension, followed by row times columns of floats:

#!/usr/bin/python
#
# read a numpy file, and write a simple binary file containing
#   two integers 'n' and 'k' for rows and columns
#   n times k floats with the actual matrix
# which can be read by any application or language that can read binary
 
import struct
import numpy as np

inputfile = "/tmp/data.npy"
outputfile = "/tmp/data.bin"

# load from the file
mat = np.load(inputfile)

# create a binary file
binfile = file(outputfile, 'wb')
# and write out two integers with the row and column dimension
header = struct.pack('2I', mat.shape[0], mat.shape[1])
binfile.write(header)
# then loop over columns and write each
for i in range(mat.shape[1]):
    data = struct.pack('%id' % mat.shape[0], *mat[:,i])
    binfile.write(data)
binfile.close()

Lastly, a quick littler script showing how R can read the data in a handful of lines:

#!/usr/bin/r

infile <- "/tmp/data.bin"
con <- file(infile, "rb")
dim <- readBin(con, "integer", 2)
Mat <- matrix( readBin(con, "numeric", prod(dim)), dim[1], dim[2])
close(con)

print(Mat)

That did the job---and I already used to converter to read a few weeks worth of data for further analysis in R. This obviously isn't the last word on possible solutions as the additional temporary file can be wasteful (unless it forms a cache for data read multiple times). If someone has nice solutions, please don't hold back and contact me. Thanks again to Laurent for the winning suggestion concerning struct, and help in getting the examples shown here to work.

/code/snippets | permanent link

Fri, 29 Jun 2012

Rcpp 0.9.13

The bug-fix in version 0.9.12 of Rcpp turned out to be incomplete, so a new version 0.9.13 is now on CRAN and will get to Debian shortly.

The Rcpp::Enviroment constructor is now properly fixed (using the global environment as a default value). As well, a new #define was created to detect the clang++ compiler before/after version 3.0.0 as one exceptions header changed location. Unit tests files were also once more updated.

The complete NEWS entry for 0.9.13 is below; more details are in the ChangeLog file in the package and on the Rcpp Changelog page.

0.9.13  2012-06-28

    o   Truly corrected Rcpp::Environment class by having default constructor
        use the global environment, and removing the default argument of
        global environment from the SEXP constructor

    o   Added tests for clang++ version to include bits/exception_defines.h
        for versions 3.0 or higher (similar to g++ 4.6.0 or later), needed to
        include one particular exceptions header

    o   Made more regression tests conditional on the RunAllRcppTests to come
        closer to the CRAN mandate of running tests in sixty seconds

    o   Updated unit test wrapper tests/doRUnit.R as well as unitTests/runTests.R

Thanks to CRANberries, you can also look at a diff to the previous release 0.9.12. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page

/code/rcpp | permanent link

Mon, 25 Jun 2012

Rcpp 0.9.12

A bug-fix release 0.9.12 of Rcpp arrived earlier today on CRAN and is now in Debian too.

This fixes a minor snafu with the Rcpp::Enviroment constructor following a small change made for 0.9.11. It also reduces the number of unit tests running by default as CRAN was complaining that it took too long to run the checks. I am not impressed---tests are for running, not skipping. Lastly, the cleanup needed to be extended to another directory.

The complete NEWS entry for 0.9.12 is below; more details are in the ChangeLog file in the package and on the Rcpp Changelog page.

0.9.12  2012-06-23

    o   Corrected Rcpp::Environment class by removing (empty) ctor following
        rev3592 (on May 2) where default argument for ctor was moved

    o   Unit testing now supports argument --allTests to impose that all
        tests are executed; otherwise some expensive tests are skipped. This
        is arguably not the right thing to do, but CRAN maintainers insist 
        on faster tests.

    o   The cleanup script now also considers inst/unitTests/testRcppClass/src 

Thanks to CRANberries, you can also look at a diff to the previous release 0.9.11. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page

/code/rcpp | permanent link

Fri, 22 Jun 2012

Rcpp 0.9.11

Release 0.9.11 of Rcpp arrived on CRAN this morning and in Debian later today.

This is a somewhat incremental release with a few internal improvements and few new features. One interesting new development has been contributed by John Chambers who is extending 'Rcpp modules' into 'Rcpp classes' which allows R code to modify and extend C++ classes loaded via Rcpp modules; see help(setRcppClass) for more. This also lead to some changes in the code for loading modules which however requires the brand-new R version 2.15.1 released today as well.

The complete NEWS entry for 0.9.11 is below; more details are in the ChangeLog file in the package and on the Rcpp Changelog page.

0.9.11  2012-06-22

    o   New member function for vectors (and lists etc) containsElementNamed() 
        which returns a boolean indicating if the given element name is present

    o   Updated the Rcpp.package.skeleton() support for Rcpp modules by
        carrying functions already present from the corresponding unit test
        which was also slightly expanded; and added more comments to the code 

    o   Rcpp modules can now be loaded via loadRcppModules() from .onLoad(),
        or via loadModule("moduleName") from any R file 

    o   Extended functionality to let R modify C++ clases imported via modules
        documented in help(setRcppClass)

    o   Support compilation in Cygwin thanks to a patch by Dario Buttari

    o   Extensions to the Rcpp-FAQ and the Rcpp-modules vignettes

    o   The minium version of R is now 2.15.1 which is required for some of
        the Rcpp modules support 

Thanks to CRANberries, you can also look at a diff to the previous release 0.9.10. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page

/code/rcpp | permanent link

Mon, 21 May 2012

RcppArmadillo 0.3.2.0

A new stable release 3.2.0 of Armadillo is now available. As usual, we have wrapped this into a new RcppArmadillo package, now at 0.3.0.2; and this version is now available via CRAN.

The short NEWS entry follows below. For those interested in following RcppArmadillo eevn more closely, we generally track Conrad's Armadillo release candidates as well in SVN on R-Forge but do no longer submit these to CRAN (as the CRAN maintainers have enough incoming packages).

0.3.2.0  2012-05-21

    o   Upgraded to Armadillo release 3.2.0 "Creamfields"

          * faster eigen decomposition via "divide and conquer" algorithm
          * faster transpose of vectors and compound expressions
          * faster handling of diagonal views
          * faster handling of tiny fixed size vectors (≤ 4 elements)
          * added unique(), for finding unique elements of a matrix

Courtesy of CRANberries, there is also a diffstat report for 0.3.2.0 relative to 0.3.0.3 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Wed, 16 May 2012

RProtoBuf 0.2.4

A new release 0.2.4 of RProtoBuf is now on CRAN. RProtoBuf provides GNU R bindings for the Google Protobuf data encoding library used and released by Google.

This release once again contains a number of patches kindly contributed by Murray Stokely, as well as an added header file needed to build with the g++ 4.7 version which has become the build standard on CRAN.

The NEWS file entry follows below:

0.2.4   2012-05-15

    o   Applied several patches kindly supplied by Murray Stokely to
         - properly work with repeated strings 
         - correct C++ function naming in a few instances
         - add an example of ascii export/import of messages

    o   Suppport g++-4.7 and stricter #include file checking by adding unistd

    o   Made small improvements to the startup code

CRANberries also provides a diff to the previous release 0.2.3. More information is at the RProtoBuf page which has a draft package vignette, a 'quick' overview vignette and a unit test summary vignette. Questions, comments etc should go to the rprotobuf mailing list off the RProtoBuf page at R-Forge. Updated to show NEWS rather than ChangeLog

/code/rprotobuf | permanent link

Tue, 15 May 2012

RcppSMC 0.1.1

CRAN now tests packages against g++-4.7 (as this version has become the default on Debian's testing variant. This compiler switch once again triggered a set of build failures, mostly from include files now deemed missing. For RcppSMC, it came down to a five-character patch of explicitly stating one max() call as std::max()

No other changes were made at this point. The NEWS entry is below:

0.1.1   2012-05-14

    o   Version 0.1.1 

    o   Minor g++-4.7 build fix of using std::max() explicitly

Courtesy of CRANberries, there is also a diffstat report for 0.1.1 relative to 0.1.0 As always, more detailed information is on the RcppSMC page,

/code/rcpp | permanent link

Sat, 05 May 2012

RcppArmadillo 0.3.0.3

Two days ago, Conrad Sanderson released another bug-fix version 3.0.3 for the 3.0.0 branch of his excellent Armadillo C++ template library for linear algebra. The new RcppArmadillo release 0.3.0.3 which contains it appeared on CRAN yesterday. Beside Conrad's bugfixes we also added an example script for fastLm and faster linear model fits.

The short NEWS entry follows below.

0.3.0.3 2012-05-03

    o   Upgraded to Armadillo release 3.0.3

          * fixes for inplace transpose of complex number matrices
          * fixes for complex number version of svd_econ()
          * fixes for potential aliasing issues with submatrix views

    o   New example script fastLm 

Courtesy of CRANberries, there is also a diffstat report for 0.3.0.3 relative to 0.3.0.2 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Tue, 01 May 2012

Boston Marathon 2012

Two weeks ago I ran the 116th running of the Boston Marathon. Weather is often part of the story in Boston as e.g. my first one in 2007 happened after a storm-drenched and very wet weekend. This time, it was the opposite. Temperatures in the 60s in the morning before the start at 10:00am, which tempaeratures climbing well into the 80s for most of the race. The organizers had sent increasingly alarming email to us participants, warning about the weather conditions. And it was unpleasantly hot. Race support was great though: lots of locals had sprinklers pointed to the street; water stops were plenty and amply stocked and, being a nice summery day, lots of spectators made for lots of cheer.

But heat makes for dreadful running conditions. I was pretty well trained, but I ended up running 3:43:55, or a tad slower than in Chicago 2007 (also record heat, with the race stopped around the four-hour mark) and Chicago 2009 (hot, though not that hot---but I was somewhat undertrained). Which makes this a personal worst. Oh well, stuff happens. And it was a pretty rough day---reports claimed that 2100 runners, or about 10%, were seeking some form of medical attention during or after the race which is a much higher proportion than under normal conditions.

The chart below is an updated version of a chart which had appeared on my running page before. I regrouped my prior Bostons (2007, 2009 and 2010) on the left, and the Chicago runs 2007 and 2009 on the right---and each with the current Boston marathon.

(Boston 2012 marathon in comparison to other Boston and Chicago marathons)

So compared to the prior Boston races, I did indeed relatively poorly. Starting slower, and slowing down even more leading to about 1:45 for the first half and 1:58 for the second. Compared to the two poor Chicago races (put of the six times I have ran that race), things are not quite as bad as at least I didn't peak at over 10 min/mile. Weather will always remain a challenge. The previous marathon (which was last fall's Fox Valley 2011) was unseasonally cold and wet---making for a time fast enough to get me back to Boston. I am registered again for this one, so we will see what September brings this year.

/sports/running | permanent link

Thu, 19 Apr 2012

RcppArmadillo 0.3.0.2 released and on CRAN

Earlier today, Conrad Sanderson released another bug-fix version 3.0.2 for the still fairly recent 3.0.0 version of his excellent Armadillo C++ template library for linear algebra. The new RcppArmadillo release 0.3.0.2 also appeared on CRAN this morning.

Beside Conrad's bugfix in Armadillo itself, he also convinced us to unroll a change imposed by R: NDEBUG is now being defined unconditionally when compiling R packages. In Armadillo's case, this suppresses a number of useful things including bounds-checking. So we now undefine this symbol in the initial RcppArmadillo headers. Users can still set it manually, and/or define ARMA_NO_DEBUG.

The short NEWS entry follows below.

0.3.0.2 2012-04-19

    o   Upgraded to Armadillo release 3.0.2

          * fixes for handling diagonal matrices

    o   Undefine NDEBUG if it has been set (as R does) as this prevents a
        number of useful debugging checks. Users can still define it or
        define ARMA_NO_DEBUG if they want a 'non-development' build

Courtesy of CRANberries, there is also a diffstat report for 0.3.0.2 relative to 0.3.0.1 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Fri, 13 Apr 2012

RDieHarder 0.1.2

RDieHarder is an R package providing access to the DieHarder battery of tests for random number generators developed by Robert G. Brown and others. DieHarder had been updated to version 3.1.1 a while back, and I had been a little behind with updating RDieHarder. Version 0.1.2 rectifies this.

The package still comes with a vignette describing both DieHarder and the RDieHarder package. And because pictures speak louder than a thousand (blogged) words, here is the first chart from the vignette:

RDieHarder test of weak RNG RDieHarder test of stronger RNG
On the left, we have a poor random-number generator (RNG), the older ran0 function. The histogram illustrating the distribution of test scores is somewhat uneven. An ideal (and asymptotic) outcode is a uniform distribution of p-values from the test. The empirical cumulative distribution function (ECDF) below indicates a somewhat pronounced departure from the diagonal. Informally speaking, this is what the (Kuiper-)Kolmogorov-Smirnov test quantifies, and we see (in the text in the chart) that the null of can be rejected an conventional levels. Based on this example (which had a short run-time with few samples) we would indeed mistrust this (known bad) RNG.

On the right, we have a more recent and trusted RNG, the well-known Mersenne Twister. The ten histogram buckets are all closer to the expected value of one-tenth, the estimated density is closer to flat, the ECDF is closer to the diagonal and the tests don't reject---so no reason to mistrust this RNG based on this test alone.

RDieHarder lets you run a battery of such tests against a boatload of known RNGs. Here is a second example, comparing the six RNGs built into R itself:

RDieHarder test of RNGs in GNU R
And these six look fine, as you'd expect. (And yes, the ECDF charts should each be on a square plot. Another time...)

Courtesy of CRANberries, there is also a diffstat report for 0.1.2 relative to the older 0.1.1 release. More detailed information is on the RDieHarder page.

/code/rdieharder | permanent link

Thu, 12 Apr 2012

RcppArmadillo 0.3.0.1 released and on CRAN

Conrad Sanderson released a bug-fix version 3.0.1 following up on the very recent 3.0.0 version of his excellent Armadillo C++ template library for linear algebra. I made a new RcppArmadillo release 0.3.0.1 which just appeared on CRAN.

The short NEWS entry follows below.

0.3.0.1 2012-04-12

    o   Upgraded to Armadillo release 3.0.1

          * fixes for compilation errors
          * fixes for potential aliasing issues

Courtesy of CRANberries, there is also a diffstat report for 0.3.0.1 relative to 0.3.0 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Wed, 11 Apr 2012

RcppArmadillo 0.3.0 is now on CRAN

Conrad Sanderson has released a new major version 3.0.0 of his excellent Armadillo C++ template library for linear algebra. A corresponding new release 0.3.0 of RcppArmadillo is now on CRAN. This follows four pre-releases of Armadillo which we packaged as before, but for reasons detailed in another post did not send to CRAN.

All that said, here are the NEWS entries for all release since the last non-test release:

0.3.0   2012-04-10

    o   Upgraded to Armadillo release 3.0.0 "Antarctic Chilli Ranch"

          * added non-contiguous submatrix views
          * added shorthand for inverse: .i()
          * added hist() and histc()
          * faster repmat()
          * faster handling of submatrix views with one row or column 
          * faster generation of random numbers
          * faster element access in fixed size matrices
          * better detection of vector expressions by sum(), cumsum(),
            prod(), min(), max(), mean(), median(), stddev(), var() 
          * expressions X=A.i()*B and X=inv(A)*B are automatically converted
            to X=solve(A,B) 

0.2.40  2012-04-04

    o   Upgraded to Armadillo release 2.99.4 "Antarctic Chilli Ranch (Beta 4)"

          * fixes for handling expressions with fixed size matrices

0.2.39  2012-04-02

    o   Upgraded to Armadillo release 2.99.3 "Antarctic Chilli Ranch (Beta 3)"

          * faster repmat()
          * workarounds for braindead compilers (eg. Visual Studio)

0.2.38  2012-03-28

    o   Upgraded to Armadillo release 2.99.2 "Antarctic Chilli Ranch (Beta 2)"

          * added .i()
          * much faster handling of .col() and .row()
          * expressions X=A.i()*B and X=inv(A)*B are automatically converted
            to X=solve(A,B) 

0.2.37  2012-03-19

    o   Upgraded to Armadillo release 2.99.1 "Antarctic Chilli Ranch (Beta 1)"

          * added non-contiguous submatrix views
          * added hist() and histc()
          * faster handling of submatrix views
          * faster generation of random numbers
          * faster element access in fixed size matrices
          * better detection of vector expressions by sum(), cumsum(),
            prod(), min(), max(), mean(), median(), stddev(), var() 

Courtesy of CRANberries, there is also a diffstat report for 0.3.0 relative to 0.2.36 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Mon, 09 Apr 2012

RcppDE 0.1.1

The RcppDE package required a bug-fix release. One of the tests compared results to those from the DEoptim package this is based upon, and these are no longer equal, leading the tests to fail. While I was at it, I finally committed a few more things that had piled up. Below are the corresponding ChangeLog entries
2012-04-08  Dirk Eddelbuettel  

        * DESCRIPTION: Release 0.1.1

        * tests/compTest.R: With the just-release DEoptim 2.2.0, results are
        no longer identical so we commented-out the stopifnot() comparison to
        not break any automated tests (as requested by the CRAN maintainers)

        * .Rbuildignore: Added a few more files which R CMD check does not
        want to see in the tarball

2011-03-07  Dirk Eddelbuettel  

        * src/evaluate.h: Also reflect '...' argument from R function we pass
        in, with thanks to Josh Ulrich for the one-line patch

        * R/DEoptim.R: No longer pass environment 'env' down
        * man/DEoptim.Rd: No longer document now unused 'env'

        * src/deoptim.cpp: Minor tweak to RcppArmadillo object creation

/code/rcpp | permanent link

Wed, 28 Mar 2012

RcppArmadillo 0.2.37 and 0.2.38 released, but not on CRAN

Conrad is readying his fabulous Armadillo C++ template library for linear algebra for a new 3.0.0 release, and started to make pre-releases. His version 2.99.1 came about a good week ago, it took us a few days to make a small required change in RcppArmadillo (which is our R package providing Armadillo to R via Rcpp) but our release 0.2.37 was ready last weekend.

But then the good folks at CRAN did not want it. That is at the same time disappointing to us as developers / maintainers: work we do will not get pushed to largest number of users immediately. On the other hand, we do understand that CRAN is under a lot of strain. Years ago John Fox demonstrated exponential growth of packages at CRAN, the rate was then around 40% per year. Right now the number of packages exceeds 3600, and there are often several dozen daily updates (which you can follow with the CRANberries service for html and rss I had set up years ago). And the three CRAN maintainers all have full-time jobs as academics, and they also happen to be busy R Core members. Plus the new R version 2.15.0 is to be released at the end of the week, and this usually entails a number of package changes, putting extra strain on CRAN right now. For more on new or updated CRAN policies, see this long thread.

Now, as I wrote on the rcpp-devel list, there is of course always install.packages("RcppArmadillo", repos="http://R-Forge.R-project.org") for a direct installation from R-Forge. But as I just put 0.2.38 there, it may take the running of the batch updates on the site so if you want it now, follow this link to my local Rcpp archive.

All that said, here are the NEWS entries for these two last releases:

0.2.38  2012-03-28

    o   Upgraded to Armadillo release 2.99.2 "Antarctic Chilli Ranch (Beta 2)"

          * added .i()
          * much faster handling of .col() and .row()
          * expressions X=A.i()*B and X=inv(A)*B are automatically converted
            to X=solve(A,B) 

0.2.37  2012-03-19

    o   Upgraded to Armadillo release 2.99.1 "Antarctic Chilli Ranch (Beta 1)"

          * added non-contiguous submatrix views
          * added hist() and histc()
          * faster handling of submatrix views
          * faster generation of random numbers
          * faster element access in fixed size matrices
          * better detection of vector expressions by sum(), cumsum(),
            prod(), min(), max(), mean(), median(), stddev(), var() 

As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Sat, 24 Mar 2012

Initial release 0.1.0 of package RcppSMC

Hm, I realized that I announced this on Google+ (via Rcpp) as well as on Twitter, on the r-packages list, wrote a new and simple web page for it, but had not put it on my blog. So here is some catching up.

Sequential Monte Carlo / Particle Filter is a (to quote the Wikipedia page I just linked to) sophisticated model estimation technique based on simulation. They are related to both Kalman Filters, and Markov Chain Monte Carlo methods.

Adam Johansen has a rather nice set of C++ classes documentated in his 2009 paper in the Journal of Statistical Software (JSS). I started to play with these classes and realized that, once again, this would make perfect sense in an R extension built with the Rcpp package by Romain and myself (and in JSS too). So I put a first prototype onto R-Forge and emailed Adam who, to my pleasant surprise, was quite interested. And a couple of emails, and commits later, we are happy to present a very first release 0.1.0.

I wrote a few words on a RcppSMC page on my website where you can find a few more details. But in short, we already have example functions demonstrating the backend classes by reproducing examples from

Johansen (2009)
and his example 5.1 via pfLineartBS() for a linear bootstrap example;
Doucet, Briers and Senecal (2006)
and their (optimal) block-sampling particle filter for a linear Gaussian model (serving as an illustration as the setup does of course have an analytical solution) via the function blockpfGaussianOpt()
Gordon, Salmond and Smith (1993)
and their ubiqitous nonlinear state space model via the function pfNonlinBS().

And to illustrate just why Rcpp is so cool for this, here is a little animation of a callback from the C++ code when doing the filtering on Adam's example 5.1. By passing a simple plotting function, written in R, to the C++ code, we can get a plot updated on every iteration. Here I cheated a little and used our old plot function with fixed ranges, the package now uses a more general function:

Example of RcppSMC callback to R plot when estimation example 5.1 from Johansen (2009)

The animation is of course due to ImageMagick glueing one hundred files into a single animated gif.

More information about RcppSMC is on its page, and we intend to add more examples and extensions over time.

/code/rcpp | permanent link

Mon, 19 Mar 2012

R / Finance 2012 Open for Registration

The annoucement below just went to the R-SIG-Finance list. More information is as usual the the R / Finance page:

Now open for registrations:

R / Finance 2012: Applied Finance with R
May 11 and 12, 2012
Chicago, IL, USA

The registration for R/Finance 2012 -- which will take place May 11 and 12 in Chicago -- is NOW OPEN!

Building on the success of the three previous conferences in 2009, 2010, and 2011, we expect more than 250 attendees from around the world. R users from industry, academia, and government will join 40+ presenters covering all areas of finance with R.

This year's conference will start earlier in the day on Friday, to accommodate the tremendous line up of speakers for 2012, as well as to provide more time between talks for networking.

We are very excited about the four keynotes by Paul Gilbert, Blair Hull, Rob McCulloch, and Simon Urbanek. The main agenda includes nineteen full presentations and eighteen shorter "lightning talks". We are also excited to offer six optional pre-conference seminars on Friday morning.

Once again, we are hosting the R/Finance conference dinner on Friday evening, where you can continue conversations while dining and drinking atop a West Loop restaurant overlooking the Chicago skyline.

More details of the agenda are available at:

http://www.RinFinance.com/agenda/

Registration information is available at

http://www.RinFinance.com/register/
and can also be directly accessed by going to
http://www.regonline.com/RFinance2012

On behalf of the committee and sponsors, we look forward to seeing you in Chicago!

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich
Our 2012 Sponsors:
International Center for Futures and Derivatives at UIC

Revolution Analytics
Sybase
MS-Computational Finance at University of Washington

Google
lemnica
OpenGamma
OneTick
RStudio
Tick Data

See you in Chicago in May!!

/computers/R | permanent link

Thu, 15 Mar 2012

digest 0.5.2

A new version of the digest package (which generates hash function summaries for arbitrary (and possibly nested) R objects using any of the standard md5, sha-1, sha-256 or crc32 algorithms) is now on CRAN.

Murray Stokely noticed a corner case where a file that was inaccessible lead to a segmentation fault. Thanks to his patch, this is now caught and flagged. He even added a test for it and fixed another one of my idioms. Nice work, so keep the patches coming folks :)

CRANberries provides the usual summary of changes to version 0.5.1.

As usual, our package is available via the R-Forge page leading to svn and tarball access, my digest page and the local directory here.

/code/digest | permanent link

Tue, 06 Mar 2012

RcppArmadillo 0.2.36

RcppArmadillo release 0..2.36 is now on CRAN. It contains just the changes from the new Armadillo release 2.4.4. The NEWS entry below summarises the changes.

0.2.36  2012-03-05

    o   Upgraded to Armadillo release 2.4.4

          * fixes for qr() and syl()
          * more portable wall_clock class
          * faster relational operators on submatrices

Courtesy of CRANberries, there is also a diffstat reports for 0.2.36 relative to 0.2.35 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Sat, 18 Feb 2012

Using Rcout with Rcpp / RcppArmadillo to coordinate output with R

The new RcppArmadillo release 0.2.35 now supports the Rcpp::Rcout output stream device. Based on a contributed Rcpp patch by Jelper Ypma, the Rcpp::Rcout output stream gets redirected to R's buffered output. In other words, R's own output and that eminating from C++ code using Rcpp::Rcout are now both in sync. This avoids a stern warning from Section 5.6 in the Writing R Extensions manual:
Using C++ iostreams, as in this example, is best avoided. There is no guarantee that the output will appear in the R console, and indeed it will not on the R for Windows console. Use R code or the C entry points (*note Printing) for all I/O if at all possible.
and does in fact provide exactly what is recommended: the same entry points R itself uses.

Below is a sample program, once again using the wonderful inline package to compile, load and link C++ code into R from a simple text variable submitted to the cxxfunction. What is shown in R code to load the package, the definition of the C++ code as assigned to a variable src and the creation of the dynamically-loadaded R function called fun which contains the code from we compiled, link and load via a single call to cxxfunction() given src.

library
library(inline)

src <- '

  Rcpp::Rcout << "Armadillo version: " << arma::arma_version::as_string() << std::endl;

  // directly specify the matrix size (elements are uninitialised)
  arma::mat A(2,3);

  // .n_rows = number of rows    (read only)
  // .n_cols = number of columns (read only)
  Rcpp::Rcout << "A.n_rows = " << A.n_rows << std::endl;
  Rcpp::Rcout << "A.n_cols = " << A.n_cols << std::endl;

  // directly access an element (indexing starts at 0)
  A(1,2) = 456.0;

  A.print("A:");

  // scalars are treated as a 1x1 matrix,
  // hence the code below will set A to have a size of 1x1
  A = 5.0;
  A.print("A:");

  // if you want a matrix with all elements set to a particular value
  // the .fill() member function can be used
  A.set_size(3,3);
  A.fill(5.0);
  A.print("A:");


  arma::mat B;

  // endr indicates "end of row"
  B << 0.555950 << 0.274690 << 0.540605 << 0.798938 << arma::endr
    << 0.108929 << 0.830123 << 0.891726 << 0.895283 << arma::endr
    << 0.948014 << 0.973234 << 0.216504 << 0.883152 << arma::endr
    << 0.023787 << 0.675382 << 0.231751 << 0.450332 << arma::endr;

  // print to the cout stream
  // with an optional string before the contents of the matrix
  B.print("B:");

  // the << operator can also be used to print the matrix
  // to an arbitrary stream (cout in this case)
  Rcpp::Rcout << "B:" << std::endl << B << std::endl;

  // save to disk
  B.save("B.txt", arma::raw_ascii);

  // load from disk
  arma::mat C;
  C.load("B.txt");

  C += 2.0 * B;
  C.print("C:");


  // submatrix types:
  //
  // .submat(first_row, first_column, last_row, last_column)
  // .row(row_number)
  // .col(column_number)
  // .cols(first_column, last_column)
  // .rows(first_row, last_row)

  Rcpp::Rcout << "C.submat(0,0,3,1) =" << std::endl;
  Rcpp::Rcout << C.submat(0,0,3,1) << std::endl;

  // generate the identity matrix
  arma::mat D = arma::eye<arma::mat>(4,4);

  D.submat(0,0,3,1) = C.cols(1,2);
  D.print("D:");

  // transpose
  Rcpp::Rcout << "trans(B) =" << std::endl;
  Rcpp::Rcout << trans(B) << std::endl;

  // maximum from each column (traverse along rows)
  Rcpp::Rcout << "max(B) =" << std::endl;
  Rcpp::Rcout << max(B) << std::endl;

  // maximum from each row (traverse along columns)
  Rcpp::Rcout << "max(B,1) =" << std::endl;
  Rcpp::Rcout << max(B,1) << std::endl;

  // maximum value in B
  Rcpp::Rcout << "max(max(B)) = " << max(max(B)) << std::endl;

  // sum of each column (traverse along rows)
  Rcpp::Rcout << "sum(B) =" << std::endl;
  Rcpp::Rcout << sum(B) << std::endl;

  // sum of each row (traverse along columns)
  Rcpp::Rcout << "sum(B,1) =" << std::endl;
  Rcpp::Rcout << sum(B,1) << std::endl;

  // sum of all elements
  Rcpp::Rcout << "sum(sum(B)) = " << sum(sum(B)) << std::endl;
  Rcpp::Rcout << "accu(B)     = " << accu(B) << std::endl;

  // trace = sum along diagonal
  Rcpp::Rcout << "trace(B)    = " << trace(B) << std::endl;

  Rcpp::Rcout << std::endl;
'

fun <- cxxfunction(signature(), body=src, plugin="RcppArmadillo")

setwd("/tmp")                           # adjust on other OSs

fun()                                   # output to stdout

sink("rcpparma.log.txt")                # start 'sink' to output to file
fun()                                   # no output to screen
sink()                                  # stop 'sink'
We then switch to a temporary directory (as the example code, taken from one of the two examples in Conrad's Armadillo sources, creates a temporary file) and run the new function. To demontrate how it does in fact now mesh perfectly with R, we create an output 'sink' (which catches all output) and re-run.

This simple example demonstrated how we can use the new Rcout output stream from Rcpp to have dynamically-loaded C++ code cooperate more cleanly with the (buffered) R output. It also demontrated some of the nice features in Armadillo which we bring to R via RcppArmadillo.

/code/snippets | permanent link

Fri, 17 Feb 2012

RcppArmadillo 0.2.35

Now that Rcpp 0.9.10 is released and on CRAN, other packages can take advantage of a small change needed to make use of the quasi-output stream Rcpp::Rcout. So the new release 0.2.35 of RcppArmadillo does just that---and input/output from Armadillo, the wonderful linear algebra / math library for C++ written chiefly by Conrad Sanderson now comes through in a coordinated buffered fashion. As as std::cout is no longer used by default, the R CMD check no longer flags this.

There are two more bugfixes in the releases for issues noticed by attentive RcppArmadillo users. Teo Guo Ci spotted a missing semicolon when C++0x code was activated, and Gershon Bialer suggested a fix for an issue plagueing users of the jurassic g++ 4.2.1 compiler which Apple forces on users of OS X. We thank Martin Renner for testing these fixes. The NEWS entry below summarises the changes.

0.2.35  2012-02-17

    o   Upgraded to Armadillo release 2.4.3

          * Support for ARMA_DEFAULT_OSTREAM using Rcpp::Rcout added

    o   Minor bug fix release improving corner cases affecting builds:

          * Missing semicolon added in Mat_meat (when in C++0x mode), with
            thanks to Teo Guo Ci 
          * Armadillo version vars now instantiated in RcppArmadillo.cpp
            which helps older g++ versions, with thanks to Gershon Bialer
          * Thanks also to Martin Renner for testing these changes

          * Unit tests output fallback directory changed per Brian Ripley's
            request to not ever use /tmp

          * Minor update to version numbers in RcppArmadillo-package.Rd

Courtesy of CRANberries, there is also a diffstat reports for 0.2.35 relative to 0.2.34 As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

/code/rcpp | permanent link

Rcpp 0.9.10

A new release 0.9.10 of Rcpp is now on CRAN and in Debian. This is mostly internal release with a little bit of code reorgination (some of which will be used by a forthcoming RcppArmadillo release), some changes to make R CMD check happy, and one or two new features.

The complete NEWS entry for 0.9.10 is below; more details are in the ChangeLog file in the package and on the Rcpp Changelog page.

0.9.10  2012-02-16

    o   Rearrange headers so that Rcpp::Rcout can be used by RcppArmadillo et al

    o   New Rcpp sugar function mapply (for limited to two or three input vectors)

    o   Added custom version of the Rcpp sugar diff function for numeric vectors
        skipping unncesserry checks for NA

    o   Some internal code changes to reflect changes and stricter requirements
        in R CMD check in the current R-devel versions
    
    o   Corrected fixed-value initialization for IntegerVector (with thanks to
        Gregor Kastner for spotting this)

    o   New Rcpp-FAQ entry on simple way to a set compiler option for cxxfunction

Thanks to CRANberries, you can also look at a diff to the previous release 0.9.9. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page

/code/rcpp | permanent link

Wed, 15 Feb 2012

Kurt Elling, Regina Carter and Anat Cohen at the CSO

Yesterday was Valentine's Day, and Kurt Elling came back to Chicago for a show at Symphony Center. He brought along guest stars Regina Carter (violin) and Anat Cohen (clarinet), as well as a small band led by Elling's long-time collaborator Laurence Hobgood on piano.

The program was titled Passion World and featured songs from around the world. Quite impressively, Elling actually sang in French, Spanish, Portuguese, German and of cause English. The program lacked a little bit of cohesion at times, but it was still a most enjoyable and solid two-hour set. A similar concert will take place in Minneapolis on Saturday, so for those near the Twin Cities I would recommended looking for tickets.

/music/jazz/live | permanent link

Thu, 12 Jan 2012

RInside 0.2.6

A new version of RInside, now at 0.2.6, is now available via CRAN. RInside provides a set of convenience classes which facilitate embedding of R inside of C++ applications and programs, using the classes and functions provided by the Rcpp R and C++ integration package.

This release has an additional fix for the Windows use case, adds cmake support files for the examples, switches to using the same RNG initialization of time and process id that R uses, and other minor fixes.

All changes since the last release are summarized below:

2012-01-11  Dirk Eddelbuettel  

        * DESCRIPTION: Release 0.2.6

        * DESCRIPTION: Updated Description: text 

2012-01-08  Dirk Eddelbuettel  

        * src/RInside.cpp: Correct console writer on Windows to not use
        Rprintf (with thanks to both James Bates and John Brzustowski)

        * src/RInside.cpp: Update RNG seeding to same scheme now used by
        R which blends both (millisecond) time and process id

        * src/RInside.cpp: Replace fprintf(stderr,...) with Rf_error(...)
        * src/setenv/setenv.c: Idem
        
        * inst/examples/*: Added CMake build support for all example
        directories kindly provided by Peter Aberline; this helps when coding
        inside of IDEs such as Eclipse, KDevelop or Code::Blocks

        * inst/examples/standard/Makefile.win: Allow for an R_ARCH variable
        to be set to enforce either i386 or x64 builds

CRANberries also provides a short report with changes from the previous release. More information is on the RInside page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.

/code/rinside | permanent link