Dirk Eddelbuettel Thinking inside the box
 
Sat, 18 Feb 2012

Using Rcout with Rcpp / RcppArmadillo to coordinate output with R
The new RcppArmadillo release 0.2.35 now supports the Rcpp::Rcout output stream device. Based on a contributed Rcpp patch by Jelper Ypma, the Rcpp::Rcout output stream gets redirected to R's buffered output. In other words, R's own output and that eminating from C++ code using Rcpp::Rcout are now both in sync. This avoids a stern warning from Section 5.6 in the Writing R Extensions manual:

Using C++ iostreams, as in this example, is best avoided. There is no guarantee that the output will appear in the R console, and indeed it will not on the R for Windows console. Use R code or the C entry points (*note Printing) for all I/O if at all possible.
and does in fact provide exactly what is recommended: the same entry points R itself uses.

Below is a sample program, once again using the wonderful inline package to compile, load and link C++ code into R from a simple text variable submitted to the cxxfunction. What is shown in R code to load the package, the definition of the C++ code as assigned to a variable src and the creation of the dynamically-loadaded R function called fun which contains the code from we compiled, link and load via a single call to cxxfunction() given src.

library
library(inline)

src <- '

  Rcpp::Rcout << "Armadillo version: " << arma::arma_version::as_string() << std::endl;

  // directly specify the matrix size (elements are uninitialised)
  arma::mat A(2,3);

  // .n_rows = number of rows    (read only)
  // .n_cols = number of columns (read only)
  Rcpp::Rcout << "A.n_rows = " << A.n_rows << std::endl;
  Rcpp::Rcout << "A.n_cols = " << A.n_cols << std::endl;

  // directly access an element (indexing starts at 0)
  A(1,2) = 456.0;

  A.print("A:");

  // scalars are treated as a 1x1 matrix,
  // hence the code below will set A to have a size of 1x1
  A = 5.0;
  A.print("A:");

  // if you want a matrix with all elements set to a particular value
  // the .fill() member function can be used
  A.set_size(3,3);
  A.fill(5.0);
  A.print("A:");


  arma::mat B;

  // endr indicates "end of row"
  B << 0.555950 << 0.274690 << 0.540605 << 0.798938 << arma::endr
    << 0.108929 << 0.830123 << 0.891726 << 0.895283 << arma::endr
    << 0.948014 << 0.973234 << 0.216504 << 0.883152 << arma::endr
    << 0.023787 << 0.675382 << 0.231751 << 0.450332 << arma::endr;

  // print to the cout stream
  // with an optional string before the contents of the matrix
  B.print("B:");

  // the << operator can also be used to print the matrix
  // to an arbitrary stream (cout in this case)
  Rcpp::Rcout << "B:" << std::endl << B << std::endl;

  // save to disk
  B.save("B.txt", arma::raw_ascii);

  // load from disk
  arma::mat C;
  C.load("B.txt");

  C += 2.0 * B;
  C.print("C:");


  // submatrix types:
  //
  // .submat(first_row, first_column, last_row, last_column)
  // .row(row_number)
  // .col(column_number)
  // .cols(first_column, last_column)
  // .rows(first_row, last_row)

  Rcpp::Rcout << "C.submat(0,0,3,1) =" << std::endl;
  Rcpp::Rcout << C.submat(0,0,3,1) << std::endl;

  // generate the identity matrix
  arma::mat D = arma::eye<arma::mat>(4,4);

  D.submat(0,0,3,1) = C.cols(1,2);
  D.print("D:");

  // transpose
  Rcpp::Rcout << "trans(B) =" << std::endl;
  Rcpp::Rcout << trans(B) << std::endl;

  // maximum from each column (traverse along rows)
  Rcpp::Rcout << "max(B) =" << std::endl;
  Rcpp::Rcout << max(B) << std::endl;

  // maximum from each row (traverse along columns)
  Rcpp::Rcout << "max(B,1) =" << std::endl;
  Rcpp::Rcout << max(B,1) << std::endl;

  // maximum value in B
  Rcpp::Rcout << "max(max(B)) = " << max(max(B)) << std::endl;

  // sum of each column (traverse along rows)
  Rcpp::Rcout << "sum(B) =" << std::endl;
  Rcpp::Rcout << sum(B) << std::endl;

  // sum of each row (traverse along columns)
  Rcpp::Rcout << "sum(B,1) =" << std::endl;
  Rcpp::Rcout << sum(B,1) << std::endl;

  // sum of all elements
  Rcpp::Rcout << "sum(sum(B)) = " << sum(sum(B)) << std::endl;
  Rcpp::Rcout << "accu(B)     = " << accu(B) << std::endl;

  // trace = sum along diagonal
  Rcpp::Rcout << "trace(B)    = " << trace(B) << std::endl;

  Rcpp::Rcout << std::endl;
'

fun <- cxxfunction(signature(), body=src, plugin="RcppArmadillo")

setwd("/tmp")                           # adjust on other OSs

fun()                                   # output to stdout

sink("rcpparma.log.txt")                # start 'sink' to output to file
fun()                                   # no output to screen
sink()                                  # stop 'sink'
We then switch to a temporary directory (as the example code, taken from one of the two examples in Conrad's Armadillo sources, creates a temporary file) and run the new function. To demontrate how it does in fact now mesh perfectly with R, we create an output 'sink' (which catches all output) and re-run.

This simple example demonstrated how we can use the new Rcout output stream from Rcpp to have dynamically-loaded C++ code cooperate more cleanly with the (buffered) R output. It also demontrated some of the nice features in Armadillo which we bring to R via RcppArmadillo.

/code/snippets | permanent link

Wed, 30 Nov 2011

Wicked Webapps with R, err, Wt
A few months ago, I had blogged about using R inside of Qt. This used our RInside package for embedding the statistical programming environment and language R inside of a C++ application, and further relies on our Rcpp package for R and C++ integration and object mapping.

The example was simple yet powerful: a reimplementation of the standard GUI application of a standard density estimate. Here the user can pick a kernel density function from a selection, and also slide a bandwidth parameter. One nice addition was an entry field already populated with a simple expression for a mixture of Normals, allowing for arbitrary random distributions over which to estimate. The example is pretty (thanks to Qt), and was added to RInside with the last CRAN release 0.2.4. The blog post has a nice screenshot.

I had long wondered how to do something similar 'on the web'. Web integration and application frameworks are of course a dime a dozen: Just about any language offers this, with more or less ease. But I wanted something simple yet powerful and fast. And I did not like the idea of a multi-tier app, or of a multi-language mix. I remember having seen something about a web-application framework not unlike Qt, and studying the very useful Wikipedia web application framework comparison I re-discovered Wt (pronounced "Witty"). So there it is, using C++ which brings us ample performance, the ability to connect to a number of libraries and applications (which is important in my quantitatively-minded workd) and avoid the whole multi-tier, multi-language combination. The Wt website has a few more good reason why this may be a suitable idea; the toolkit also offers a very decent amount of features and is amply documented with a fair number of examples.

And after just a little bit poking around during two weekends, I now have the following webapp committed in the SVN repository of RInside, and it is all implemented in in less than two hundred (generously commented) lines of code.

Example of embedding R via RInside into a Wt C++ application: density estimation for a mixture

It is currently up and running at the address shown in the screenshot, so give it a go (though I may take it down at another point in time). I quite like it: The application is responsive: changes in the radio buttons (for the density), or the bandwidth, each trigger reestimation of the density, and a new and updated chart is displayed immediately with no noticeable delay---just like the desktop application did.

Best of all, the code logic is essentially unchanged from the Qt-based app. Signals and slots related events to actions, the layout is in terms of standard GUI boxen and containers. And best of all, I did not have to write a line of html, javascript, css or ajax: it is all handled by the Wt toolkit. I was able to drive the app from an Android phone, my tablet, various computers around the house, and had a few friends poke a stick at it from afar.

There is at one open issue. Wt launches new instances of the application object with each connection, which is a very clean model. That doesn't map perfectly with R (which is single-threaded) and RInside (which runs as a singleton for the same reason). So right now, each action sends its state back to the client. In other words, each clients own its parameters and well as vector of random numbers. Each new action sends these back to the app which passes it to R, launches the re-estimation and gets an updated chart back which is then shown by the the client. That is not perfect, and maybe a forking model as used by Simon's RServe would be better, though it would require a rewrite of RInside. Not sure if we get there anytime soon. And for simple applications not facing legions of concurrent users, the singleton should still work. It's a proof of concept at this point. Feedback welcome, and RInside and Rcpp questions should go to the rcpp-devel list as usual.

/code/snippets | permanent link

Sat, 23 Apr 2011

Another nice Rcpp example
While preparing my slides for the Rcpp workshop this Thursday, I had wondered about more nice examples motivating Rcpp. So I posed a quick question on the rcpp-devel list.

And I received a few friendly answers. My favourite, so far, was a suggestion by Lance Bachmeier who sent me a short script which used both R and C++ (via Rcpp) to simulate a first-order vector autoregressive process (and he ensured me that it worked well enough on his graduate students). It is indeed a great example as it involves (simple) matrix multiplication in an iterative fashion. Which makes it a great example not only for Rcpp but also for our RcppArmadillo package (which wraps Conrad Sanderson's wonderful Armadillo C++ templated library for linear algebra and more). And at the same time, we can also add another look at the new and shiny R compiler I also blogged about recently.

So Lance and I iterated over this a little more over email, and I now added this as a new (and initial) example file in the RcppArmadillo SVN repo. (As an aside: The newest version 0.2.19 of RcppArmadillo has been sitting in incoming at CRAN since earlier in the week while the archive maintainer takes a well-deserved vacation. It should hit the public archive within a few days, and is otherwise available too from my site.)

So let's walk through the example:

R> ## parameter and error terms used throughout
R> a <- matrix(c(0.5,0.1,0.1,0.5),nrow=2)
R> e <- matrix(rnorm(10000),ncol=2)
R> ## Let's start with the R version
R> rSim <- function(coeff, errors) {
+   simdata <- matrix(0, nrow(errors), ncol(errors))
+   for (row in 2:nrow(errors)) {
+     simdata[row,] = coeff %*% simdata[(row-1),] + errors[row,]
+   }
+   return(simdata)
+ }
R> rData <- rSim(a, e)                     # generated by R
This starts with a simple enough loop. After skipping the first row, each iteration multiplies the previous row with the parameters and adds error terms.

We can then turn to the R compiler:

R> ## Now let's load the R compiler (requires R 2.13 or later)
R> suppressMessages(require(compiler))
R> compRsim <- cmpfun(rSim)
R> compRData <- compRsim(a,e)              # generated by R 'compiled'
R> stopifnot(all.equal(rData, compRData))  # checking results
Nice and easy: We load the compiler package, create a compiled function and use it. We check the results and surely enough find them to be identical.

With that, time to turn to C++ using Armadillo via RcppArmadillo:

R> ## Now load 'inline' to compile C++ code on the fly
R> suppressMessages(require(inline))
R> code <- '
+   arma::mat coeff = Rcpp::as<arma::mat>(a);
+   arma::mat errors = Rcpp::as<arma::mat>(e);
+   int m = errors.n_rows; int n = errors.n_cols;
+   arma::mat simdata(m,n);
+   simdata.row(0) = arma::zeros<arma::mat>(1,n);
+   for (int row=1; row<m; row++) {
+     simdata.row(row) = simdata.row(row-1)*trans(coeff)+errors.row(row);
+   }
+   return Rcpp::wrap(simdata);
+ '
R> ## create the compiled function
R> rcppSim <- cxxfunction(signature(a="numeric",e="numeric"),
+                        code,plugin="RcppArmadillo")
R> rcppData <- rcppSim(a,e)                # generated by C++ code
R> stopifnot(all.equal(rData, rcppData))   # checking results
Here we load the inline package to compile, link and load C++ snippets. We define a short C++ function in the code variable, declare a signature taking a and e as before and ask cxxfunction() to deploy the plugin for RcppArmadillo so that it and Rcpp are found during build. With that, we have a compiled function to generate data, and we once again check the result. The C++ code is pretty straightforward as well. We can instatiate Armadillo matrices directly from the R objects we pass down; we then run a similar loop building the result row by row.

Now, with all the build-up, here is the final timing comparison, using the rbenchmark package:

R> ## now load the rbenchmark package and compare all three
R> suppressMessages(library(rbenchmark))
R> res <- benchmark(rcppSim(a,e),
+                  rSim(a,e),
+                  compRsim(a,e),
+                  columns=c("test", "replications", "elapsed",
+                            "relative", "user.self", "sys.self"),
+                  order="relative")
R> print(res)
            test replications elapsed relative user.self sys.self
1  rcppSim(a, e)          100   0.038   1.0000      0.04        0
3 compRsim(a, e)          100   2.011  52.9211      2.01        0
2     rSim(a, e)          100   4.148 109.1579      4.14        0
So in a real-world example involving looping and some algebra (which is of course already done by BLAS and LAPACK libraries), the new R compiler improves by more than a factor of two, cutting time from 4.14 seconds down to about 2 seconds. Yet, this still leaves the C++ solution, clocking in at a mere 38 milliseconds, ahead by a factor of over fifty relative to the new R compiler

And compared to just R itself, the simple solution involving Rcpp and RcppArmadillo is almost 110 times faster. As I mentioned, I quite like this example ;-).

/code/snippets | permanent link

Fri, 25 Mar 2011

R inside Qt: A simple RInside application
The RInside package makes it pretty simple and straightforward to embed R, the wonderful statistical programming environment and language, inside of a C++ application. This uses both the robust embedding API provided by R itself, and the higher-level abstractions from our Rcpp package. A number of examples are shown on this blog both here and here as well as on the RInside page; and the source package actually contains well over a dozen complete examples which cover anything from simple examples to parallel use via MPI for parallel computing.

Beginning users sometimes ask about how to use RInside inside larger projects. And as I had meant to experiment with embedding inside of the powerful Qt framework anyway, I started to dabble a little. A first result is now in the SVN sources of RInside.

My starting point was the classic tkdensity demo that comes with R itself. It is a good point of departure as Tcl/Tk makes it very portable---in fact it should run on every platform that runs R---and quite expressive. And having followed some of the GUI experiments around R over the years, I have also seen various re-implementations using different GUI frameworks. And so I am adding mine to this body of work:

Example of embedding R via RInside into a Qt C++ application: density estimation for a mixture

The problem I addressed first was actual buildability. For the RInside examples, Romain and I provide a Makefile that just works by making calls to R itself to learn about flags for R, Rcpp and RInside such that all required headers and libraries are found. That is actually relatively straightforward (and documented in our vignettes) but a little intimidating at first---which is why a ready-made Makefile is a good thing.

Qt of course uses qmake and the .pro files to encode / resolve dependencies. So task one was to map what our Makefile does into its variables. Turns out that wasn't all that hard:

## -*- mode: Makefile; c-indent-level: 4; c-basic-offset: 4;  tab-width: 8; -*-
##
## Qt usage example for RInside, inspired by the standard 'density
## sliders' example for other GUI toolkits
##
## Copyright (C) 2011  Dirk Eddelbuettel and Romain Francois

TEMPLATE =              app
HEADERS =               qtdensity.h 
SOURCES =               qtdensity.cpp main.cpp

QT +=                   svg

## comment this out if you need a different version of R, 
## and set set R_HOME accordingly as an environment variable
R_HOME =                $$system(R RHOME)

## include headers and libraries for R 
RCPPFLAGS =             $$system($$R_HOME/bin/R CMD config --cppflags)
RLDFLAGS =              $$system($$R_HOME/bin/R CMD config --ldflags)
RBLAS =                 $$system($$R_HOME/bin/R CMD config BLAS_LIBS)
RLAPACK =               $$system($$R_HOME/bin/R CMD config LAPACK_LIBS)

## if you need to set an rpath to R itself, also uncomment
#RRPATH =               -Wl,-rpath,$$R_HOME/lib

## include headers and libraries for Rcpp interface classes
RCPPINCL =              $$system($$R_HOME/bin/Rscript -e \'Rcpp:::CxxFlags\(\)\')
RCPPLIBS =              $$system($$R_HOME/bin/Rscript -e \'Rcpp:::LdFlags\(\)\')

## for some reason when building with Qt we get this each time
## so we turn unused parameter warnings off
RCPPWARNING =           -Wno-unused-parameter 
## include headers and libraries for RInside embedding classes
RINSIDEINCL =           $$system($$R_HOME/bin/Rscript -e \'RInside:::CxxFlags\(\)\')
RINSIDELIBS =           $$system($$R_HOME/bin/Rscript -e \'RInside:::LdFlags\(\)\')

## compiler etc settings used in default make rules
QMAKE_CXXFLAGS +=       $$RCPPWARNING $$RCPPFLAGS $$RCPPINCL $$RINSIDEINCL
QMAKE_LFLAGS +=         $$RLDFLAGS $$RBLAS $$RLAPACK $$RCPPLIBS $$RINSIDELIBS

## addition clean targets
QMAKE_CLEAN +=          qtdensity Makefile
The double dollar signs and escaping of parentheses are a little tedious, but hey it works and expands the compiler and linker flags such that everything .

The code itself is pretty straightforward too. We instantiate the RInside object as well as the main Qt application object. We then instantiate a new object of class QtDensity that will launch the main widget; it is given a reference to the RInside object.

// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4;  tab-width: 8; -*-
//
// Qt usage example for RInside, inspired by the standard 'density
// sliders' example for other GUI toolkits
//
// Copyright (C) 2011  Dirk Eddelbuettel and Romain Francois


#include <QApplication>

#include "qtdensity.h"

int main(int argc, char *argv[])
{
    RInside R(argc, argv);  		// create an embedded R instance

    QApplication app(argc, argv);
    QtDensity qtdensity(R);
    return app.exec();
}

The definition of the main object is pretty simple: a few private variables, and a few functions to interact with the GUI and get values from the radio buttons, slider or input field---as well as functions to update the chart or re-draw the random variables.

// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4;  tab-width: 8; -*-
//
// Qt usage example for RInside, inspired by the standard 'density
// sliders' example for other GUI toolkits
//
// Copyright (C) 2011  Dirk Eddelbuettel and Romain Francois

#ifndef QTDENSITY_H
#define QTDENSITY_H

#include <RInside.h>

#include <QMainWindow>
#include <QHBoxLayout>
#include <QSlider>
#include <QSpinBox>
#include <QLabel>
#include <QTemporaryFile>
#include <QSvgWidget>

class QtDensity : public QMainWindow
{
    Q_OBJECT

public:
    QtDensity(RInside & R);

private slots:
    void getBandwidth(int bw);
    void getKernel(int kernel);
    void getRandomDataCmd(QString txt);
    void runRandomDataCmd(void);

private:
    void setupDisplay(void);    // standard GUI boilderplate of arranging things
    void plot(void);            // run a density plot in R and update the
    void filterFile(void);      // modify the richer SVG produced by R

    QSvgWidget *m_svg;          // the SVG device

    RInside & m_R;              // reference to the R instance passed to constructor
    QString m_tempfile;         // name of file used by R for plots
    QString m_svgfile;          // another temp file, this time from Qt
    int m_bw, m_kernel;         // parameters used to estimate the density
    QString m_cmd;              // random draw command string
};

#endif

Lastly, no big magic in the code either (apart from the standard magic provided by RInside). A bit of standard GUI layouting, and then some functions to pick values from the inputs as well as to compute / update the output. One issue is worth mentioning. The screenshot and code show the second version of this little application. I built a first one using a standard portable network graphics (png) file. That was fine, but not crisp as png is a pixel format so I went back and experimented with scalable vector graphics (svg) instead. One can create svg output with R in a number of ways, one of which is the cairoDevice package by Michael Lawrence (who also wrote RGtk2 and good chunks of Ggobi). Now, it turns out that Qt displays the so-called SVG tiny standard whereas R creates a fuller SVG format. Some discussion with Michael reveals that one can modify the svg file suitably (which is what the function filterFile below does) and it all works. Well: almost. There is a bug (and Michael thinks it is the SVG rendering) in which the density estimate does not get clipped to the plotting region.

// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4;  tab-width: 8; -*-
//
// Qt usage example for RInside, inspired by the standard 'density
// sliders' example for other GUI toolkits -- this time with SVG
//
// Copyright (C) 2011  Dirk Eddelbuettel and Romain Francois

#include <QtGui>

#include "qtdensity.h"

QtDensity::QtDensity(RInside & R) : m_R(R)
{
    m_bw = 100;                 // initial bandwidth, will be scaled by 100 so 1.0
    m_kernel = 0;               // initial kernel: gaussian
    m_cmd = "c(rnorm(100,0,1), rnorm(50,5,1))"; // simple mixture
    m_R["bw"] = m_bw;           // pass bandwidth to R, and have R compute a temp.file name
    m_tempfile = QString::fromStdString(Rcpp::as<std::string>(m_R.parseEval("tfile <- tempfile()")));
    m_svgfile = QString::fromStdString(Rcpp::as<std::string>(m_R.parseEval("sfile <- tempfile()")));
    m_R.parseEvalQ("library(cairoDevice)");

    setupDisplay();
}

void QtDensity::setupDisplay(void)  {
    QWidget *window = new QWidget;
    window->setWindowTitle("Qt and RInside demo: density estimation");

    QSpinBox *spinBox = new QSpinBox;
    QSlider *slider = new QSlider(Qt::Horizontal);
    spinBox->setRange(5, 200);
    slider->setRange(5, 200);
    QObject::connect(spinBox, SIGNAL(valueChanged(int)), slider, SLOT(setValue(int)));
    QObject::connect(slider, SIGNAL(valueChanged(int)), spinBox, SLOT(setValue(int)));
    spinBox->setValue(m_bw);
    QObject::connect(spinBox, SIGNAL(valueChanged(int)), this, SLOT(getBandwidth(int)));

    QLabel *cmdLabel = new QLabel("R command for random data creation");
    QLineEdit *cmdEntry = new QLineEdit(m_cmd);
    QObject::connect(cmdEntry,  SIGNAL(textEdited(QString)), this, SLOT(getRandomDataCmd(QString)));
    QObject::connect(cmdEntry,  SIGNAL(editingFinished()), this, SLOT(runRandomDataCmd()));

    QGroupBox *kernelRadioBox = new QGroupBox("Density Estimation kernel");
    QRadioButton *radio1 = new QRadioButton("&Gaussian");
    QRadioButton *radio2 = new QRadioButton("&Epanechnikov");
    QRadioButton *radio3 = new QRadioButton("&Rectangular");
    QRadioButton *radio4 = new QRadioButton("&Triangular");
    QRadioButton *radio5 = new QRadioButton("&Cosine");
    radio1->setChecked(true);
    QVBoxLayout *vbox = new QVBoxLayout;
    vbox->addWidget(radio1);
    vbox->addWidget(radio2);
    vbox->addWidget(radio3);
    vbox->addWidget(radio4);
    vbox->addWidget(radio5);
    kernelRadioBox->setMinimumSize(260,140);
    kernelRadioBox->setMaximumSize(260,140);
    kernelRadioBox->setSizePolicy(QSizePolicy::Fixed, QSizePolicy::Fixed);
    kernelRadioBox->setLayout(vbox);

    QButtonGroup *kernelGroup = new QButtonGroup;
    kernelGroup->addButton(radio1, 0);
    kernelGroup->addButton(radio2, 1);
    kernelGroup->addButton(radio3, 2);
    kernelGroup->addButton(radio4, 3);
    kernelGroup->addButton(radio5, 4);
    QObject::connect(kernelGroup, SIGNAL(buttonClicked(int)), this, SLOT(getKernel(int)));

    m_svg = new QSvgWidget();
    runRandomDataCmd();         // also calls plot()

    QGroupBox *estimationBox = new QGroupBox("Density estimation bandwidth (scaled by 100)");
    QHBoxLayout *spinners = new QHBoxLayout;
    spinners->addWidget(spinBox);
    spinners->addWidget(slider);
    QVBoxLayout *topright = new QVBoxLayout;
    topright->addLayout(spinners);
    topright->addWidget(cmdLabel);
    topright->addWidget(cmdEntry);
    estimationBox->setMinimumSize(360,140);
    estimationBox->setMaximumSize(360,140);
    estimationBox->setSizePolicy(QSizePolicy::Fixed, QSizePolicy::Fixed);
    estimationBox->setLayout(topright);
    QHBoxLayout *upperlayout = new QHBoxLayout;
    upperlayout->addWidget(kernelRadioBox);
    upperlayout->addWidget(estimationBox);

    QHBoxLayout *svglayout = new QHBoxLayout;
    svglayout->addWidget(m_svg);

    QVBoxLayout *outer = new QVBoxLayout;
    outer->addLayout(upperlayout);
    outer->addLayout(svglayout);
    window->setLayout(outer);
    window->show();
}

void QtDensity::plot(void) {
    const char *kernelstrings[] = { "gaussian", "epanechnikov", "rectangular", "triangular", "cosine" };
    m_R["bw"] = m_bw;
    m_R["kernel"] = kernelstrings[m_kernel]; // that passes the string to R
    std::string cmd1 = "Cairo(width=6,height=6,pointsize=10,surface='svg',filename=tfile); "
                       "plot(density(y, bw=bw/100, kernel=kernel), xlim=range(y)+c(-2,2), main=\"Kernel: ";
    std::string cmd2 = "\"); points(y, rep(0, length(y)), pch=16, col=rgb(0,0,0,1/4));  dev.off()";
    std::string cmd = cmd1 + kernelstrings[m_kernel] + cmd2; // stick the selected kernel in the middle
    m_R.parseEvalQ(cmd);
    filterFile();               // we need to simplify the svg file for display by Qt 
    m_svg->load(m_svgfile);
}

void QtDensity::getBandwidth(int bw) {
    if (bw != m_bw) {
        m_bw = bw;
        plot();
    }
}

void QtDensity::getKernel(int kernel) {
    if (kernel != m_kernel) {
        m_kernel = kernel;
        plot();
    }
}

void QtDensity::getRandomDataCmd(QString txt) {
    m_cmd = txt;
}

void QtDensity::runRandomDataCmd(void) {
    std::string cmd = "y <- " + m_cmd.toStdString();
    m_R.parseEvalQ(cmd);
    plot();                     // after each random draw, update plot with estimate
}

void QtDensity::filterFile() {
    // cairoDevice creates richer SVG than Qt can display
    // but per Michaele Lawrence, a simple trick is to s/symbol/g/ which we do here

    QFile infile(m_tempfile);
    infile.open(QFile::ReadOnly);
    QFile outfile(m_svgfile);
    outfile.open(QFile::WriteOnly | QFile::Truncate);
    
    QTextStream in(&infile);
    QTextStream out(&outfile);
    QRegExp rx1("<symbol"); 
    QRegExp rx2("</symbol");    
    while (!in.atEnd()) {
        QString line = in.readLine();
        line.replace(rx1, "<g"); // so '<symbol' becomes '<g ...'
        line.replace(rx2, "</g");// and '</symbol becomes '</g'
        out << line << "\n";
    }
    infile.close();
    outfile.close();
}

What the little application does is actually somewhat neat for the few lines. One key features is that the generated data can be specified directly by an R expression which allows for mixtures (as shown, and as is the default). With that it easy to see how many points are needed in the second hump to make the estimate multi-modal, and how much of a distance between both centers is needed and so on. Obviously, the effect of the chosen kernel and bandwidth can also be visualized. And with the chart the being a support vector graphics display, we can resize and scale at will and it still looks crisp.

The code (for both the simpler png variant and the svg version shown here) is in the SVN repository for RInside and will be in the next release. Special thanks to Michael Lawrence for patiently working through some svg woes with me over a few emails.

Update: Some typos fixed.

Update 2: Two URLs corrected.

/code/snippets | permanent link

Sun, 23 Jan 2011

CRANberries is now tweeting
The CRANberries service (which reports on new and updated CRAN packages for the R language and environment) is now tweeting about new packages. Simply follow @CRANberriesFeed to receive theses messages.

For the technically minded, adding this to the existing 200-line program which runs all of CRANberries was very easy. CRANberries relies only on R itself and a few Unix tools like diffstat as well as the simple blosxom txt-to-html/rss 'blog compiler'. The tweeting itself is now done by this new function

tweetNewBlogEntry <- function(curPkg, curVer, reposurl) {
    ## tests reveal that pipe(), cat(), close() is easiest
    ## to send multiple messages, may need --background option
    con <- pipe("bti --config bti.conf", "w")
    cat("New CRAN package", curPkg, "with initial version", curVer, " http://goo.gl/pgljT\n", file=con)
    close(con)
}
which simply pipes the message in Greg KH's bti program (which is now a new dependency). Special thanks to its Debian maintainer Gregor Herrmann for some helpful emails; I am using the newest release 0.29 which itself needs liboauth0. Once OAuth tokens are set-up (and see here for how to do that) all we need is the three-liner above.

At this point I am not too sure what to do about updated packages. One message per updated package seems too noisy. To be seen---comments or suggestions welcome.

/code/snippets | permanent link

Mon, 17 Jan 2011

Keeping simple things simple
My friend Jeff deserves a sincere congratulation for finally unveiling his rebranded R consultancy Lemnica. One notable feature of the new website is a section called esoteric R which discusses less frequently-visited corners of the R world. It even boasts its own CRAN package esotericR with the example sources. esoteric R currently holds two articles. Jeff had sent me the one about introducing closures a while back, and I like it and may comment at another time. What caught me by surprise when Lemnica finally opened was the other article: R calling C.

It is a fine article motivated by all the usual reasons that are e.g. mentioned in the Google Tech Talk which Romain and I gave last October about our work around Rcpp. But it is just not simple.

Allow me to explain. When Jeff showed this C language file

#include <R.h>
#include <Rinternals.h>

SEXP esoteric_rev (SEXP x) {
  SEXP res;
  int i, r, P=0;
  PROTECT(res = allocVector(REALSXP, length(x))); P++;

  for(i=length(x), r=0; i>0; i--, r++) {
     REAL(res)[r] = REAL(x)[i-1];
  }

  copyMostAttrib(x, res);
  UNPROTECT(P);
  return res;
}
and then needs several paragraphs to explain what is going on, what is needed to compile and then how to load it --- I simply could not resist. Almost immediately, I emailed back to him something as simple as this using both our Rcpp package as well as the wonderful inline package by Oleg which Romain and I more or less adopted:
library(inline)  ## for cxxfunction()
src <- 'Rcpp::NumericVector x = Rcpp::NumericVector(xs);
        std::reverse(x.begin(), x.end());
        return(x);'
fun <- cxxfunction(signature(xs="numeric"), body=src, plugin="Rcpp")
fun( seq(0, 1, 0.1) )
Here we load inline, and then define a three-line C++ program using facilities from our Rcpp package. All we need to revert a vector is to first access its R object in C++ by instantiating the R vector as a NumericVector. These C++ classes then provide iterators which are compatible with the Standard Template Library (STL). So we simply call the STL function reverse pointing the beginning and end of the vector, and are done! Rcpp then allows us the return the C++ vector which it turns into an R vector. Efficient in-place reversal, just like Jeff had motivated, in three lines. Best of all, we can execute this from within R itself:
R> library(inline)  ## for cxxfunction()
R> src <- 'Rcpp::NumericVector x = Rcpp::NumericVector(xs);
+         std::reverse(x.begin(), x.end());
+         return(x);'
R> fun <- cxxfunction(signature(xs="numeric"), body=src, plugin="Rcpp")
R> fun( seq(0, 1, 0.1) )
 [1] 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
R>

Lastly, Jeff shows a more complete example wherein a new vector is created, and any potential attributes are copied as well. Naturally, we can do that too. First, we used clone() to make a deep copy (ie forcing creation of a new object rather than a mere proxy) and use the same R API function he accessed---but it our case both prefixed with ::Rf_ for R remapping (to protect clashed with other functions with identical names) and a global namespace identifier (as it is a global C function from R).

R> library(inline)
R> src <- 'Rcpp::NumericVector x = Rcpp::clone<Rcpp::NumericVector>(xs);
+         std::reverse(x.begin(), x.end());
+         ::Rf_copyMostAttrib(xs, x);
+         return(x);'
R> fun <- cxxfunction(signature(xs="numeric"), body=src, plugin="Rcpp")
R> obj <- structure(seq(0, 1, 0.1), obligatory="hello, world!")
R> fun(obj)
 [1] 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
attr(,"obligatory")
[1] "hello, world!"
R> obj
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
attr(,"obligatory")
[1] "hello, world!"
R>
Both the obj variable and the new copy contain the desired data attribute, the new copy is reversed, the original is untouched---and all in four lines of C++ called via one inline call. I have now been going on for over one hundred lines yet I never had to mention memory management, pointers, PROTECT or other components of the R API for C. Hopefully, this short writeup provided an idea of why Romain and I think Rcpp is the way to go for creating C/C++ functions for extending and enhancing R.

/code/snippets | permanent link

Sun, 16 Jan 2011

Plotting overbought / oversold regions in R
The good folks at Bespoke Investment Group frequently show charts of so-called overbought or oversold levels; see e.g. here for the most recent global markets snapshot.

Classifying markets as overbought or oversold is a popular heuristic. It starts from computing a rolling smoothed estimate of the prices, usually via a (exponential or standard) moving average over a suitable number of days (where Bespoke uses 50 days, see here). This is typically coupled with a (simple) rolling standard deviation. Overbought and oversold regions are then constructed by taking the smoothed mean plus/minus one and two standard deviations.

Doing this is in R is pretty easy thanks to the combination of R's rich base functions and its add-on packages from CRAN. Below is a simply function I wrote a couple of months ago---and I figured I might as well release. It relies on the powerful packages quantmod and TTR by my pals Jeff Ryan and Josh Ulrich, respectively.

## plotOBOS -- displaying overbough/oversold as eg in Bespoke's plots
##
## Copyright (C) 2010 - 2011  Dirk Eddelbuettel
##
## This is free software: you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 2 of the License, or
## (at your option) any later version.

suppressMessages(library(quantmod))     # for getSymbols(), brings in xts too
suppressMessages(library(TTR))          # for various moving averages

plotOBOS <- function(symbol, n=50, type=c("sma", "ema", "zlema"), years=1, blue=TRUE) {

    today <- Sys.Date()
    X <- getSymbols(symbol, src="yahoo", from=format(today-365*years-2*n), auto.assign=FALSE)
    x <- X[,6]                          # use Adjusted

    type <- match.arg(type)
    xd <- switch(type,                  # compute xd as the central location via selected MA smoother
                 sma = SMA(x,n),
                 ema = EMA(x,n),
                 zlema = ZLEMA(x,n))
    xv <- runSD(x, n)                   # compute xv as the rolling volatility

    strt <- paste(format(today-365*years), "::", sep="")
    x  <- x[strt]                       # subset plotting range using xts' nice functionality
    xd <- xd[strt]
    xv <- xv[strt]

    xyd <- xy.coords(.index(xd),xd[,1]) # xy coordinates for direct plot commands
    xyv <- xy.coords(.index(xv),xv[,1])

    n <- length(xyd$x)
    xx <- xyd$x[c(1,1:n,n:1)]           # for polygon(): from first point to last and back

    if (blue) {
        blues5 <- c("#EFF3FF", "#BDD7E7", "#6BAED6", "#3182BD", "#08519C") # cf brewer.pal(5, "Blues")
        fairlylight <- rgb(189/255, 215/255, 231/255, alpha=0.625) # aka blues5[2]
        verylight <- rgb(239/255, 243/255, 255/255, alpha=0.625) # aka blues5[1]
        dark <- rgb(8/255, 81/255, 156/255, alpha=0.625) # aka blues5[5]
    } else {
        fairlylight <- rgb(204/255, 204/255, 204/255, alpha=0.5)         # grays with alpha-blending at 50%
        verylight <- rgb(242/255, 242/255, 242/255, alpha=0.5)
        dark <- 'black'
    }

    plot(x, ylim=range(range(xd+2*xv, xd-2*xv, na.rm=TRUE)), main=symbol, col=fairlylight) 		# basic xts plot
    polygon(x=xx, y=c(xyd$y[1]+xyv$y[1], xyd$y+2*xyv$y, rev(xyd$y+xyv$y)), border=NA, col=fairlylight) 	# upper
    polygon(x=xx, y=c(xyd$y[1]-1*xyv$y[1], xyd$y+1*xyv$y, rev(xyd$y-1*xyv$y)), border=NA, col=verylight)# center
    polygon(x=xx, y=c(xyd$y[1]-xyv$y[1], xyd$y-2*xyv$y, rev(xyd$y-xyv$y)), border=NA, col=fairlylight) 	# lower
    lines(xd, lwd=2, col=fairlylight)   # central smooted location
    lines(x, lwd=3, col=dark)           # actual price, thicker
    invisible(NULL)
}

After downloading data and computing the rolling smoothed mean and standard deviation, it really is just a matter of plotting (appropriate) filled polygons. Here I used colors from the neat RColorBrewer package with some alpha blending. Colors can be turned off via an option to the function; ranges, data length and type of smoother can also be picked.

To call this in R, simply source the file and the call, say, plotOBOS("^GSPC", years=2) which creates a two-year plot of the SP500 as shown here:

Example chart of overbought/oversold levels from plotOBOS() function

This shows the market did indeed bounce off the oversold lows nicely on a few occassions in 2009 and 2010 --- but also continued to slide after hitting the condition. Nothing is foolproof, and certainly nothing as simple as this is, so buyer beware. But it may prove useful in conjunction with other tools.

The code for the script is here and of course available under GPL 2 or later. I'd be happy to help incorporate it into some other finance package. Lastly, if you read this post this far, also consider our R / Finance conference coming at the end of April.

Edit: Corrected several typos with thanks to Josh.

/code/snippets | permanent link