Sun, 08 Jul 2012

New package RcppCNPy with release 0.1.0 (and 0.0.1 earlier last week)

A few days ago I had blogged about getting NumPy data in R by using a simple converter script. That works fine, but it is a little annoying to have to write an entire file only to read from it again. So I kept looking around for a better solution---and soon found the cnpy library by Carl Rogers which provides simple C++ functions to read and write NumPy files.

Bringing such a C++ library to R is done very easily via Rcpp modules. The resulting package contains a single R file with a single line: loadModule("cnpy", TRUE). And it relies on the following module declarations in the a C++ file:

RCPP_MODULE(cnpy){

    using namespace Rcpp;

    function("npyLoad",                         // name of the identifier at the R level
             &npyLoad,                          // function pointer to helper function defined above
             List::create( Named("filename"),   // function arguments including default value
                           Named("type") = "numeric"),
             "read an npy file into a numeric or integer vector or matrix");

    function("npySave",                         // name of the identifier at the R level
             &npySave,                          // function pointer to helper function defined above
             List::create( Named("filename"),   // function arguments including default value
                           Named("object"), 
                           Named("mode") = "w"),
             "save an R object (vector or matrix of type integer or numeric) to an npy file");

}
which give us at the R prompt
R> library(RcppCNPy)
Loading required package: Rcpp
R> npyLoad
internal C++ function <0x243af70>
    docstring : read an npy file into a numeric or integer vector or matrix
    signature : Rcpp::RObject npyLoad(std::string, std::string)
R> npySave
internal C++ function <0x23033e0>
    docstring : save an R object (vector or matrix of type integer or numeric) to an npy file
    signature : void npySave(std::string, Rcpp::RObject, std::string)
R> 
these two functions (and their docstrings) defined above. That's all! Well there are about one hundred more lines dealing with whether we have integer or numeric data, and whether we use a vector or a matrix. But all in all pretty simple...

So version 0.1.0 of this new package RcppCNPy completes the initial release 0.0.1 from earlier in the week by adding

  • the ability to load compressing NumPy files ending in .npy.gz
  • a simple regression test suite loading some data sets
  • a demo script with a timing example comparing ascii reads to reading npy and compressed npy
  • a short pdf vignette describing the package

The NEWS entry for this release (as well as the initial one) follow:

News for Package RcppCNPy

Changes in version 0.1.0 (2012-07-07)
  • Added automatic use of transpose to automagically account for Fortran-vs-C major storage defaults between Python and R.

  • Support for integer types in dependent on the int64_t type which is available only when the -std=c++0x switch is used at build-time (and CRAN still discourages use of it)

  • Added support for reading gzip'ed files ending in ".npy.gz"

  • Added regression tests in directory tests/

  • Added a vignette describing the package

  • Added a timing benchmark in demo/timings.R

Changes in version 0.0.1 (2012-07-04)
  • Initial version, as a straightforward Rcpp modules wrap around the cpny library by Carl Rogers (on github under a MIT license).

  • At present, npy files can be read and written for vectors and matrices of either numeric or integer type. Note however that matrices are currently transposed because of the default Fortran ordering done by numpy.

I will follow up with a little usage example later.

CRANberries also provides a diffstat report for 0.1.0 relative to 0.0.1. As always, feedback is welcome and the rcpp-devel mailing list off the R-Forge page for Rcpp is the best place to start a discussion.

/code/rcpp | permanent link