Wed, 11 Jan 2017

R / Finance 2017 Call for Papers

Last week, Josh sent the call for papers to the R-SIG-Finance list making everyone aware that we will have our nineth annual R/Finance conference in Chicago in May. Please see the call for paper (at the link, below, or at the website) and consider submitting a paper.

We are once again very excited about our conference, thrilled about upcoming keynotes and hope that many R / Finance users will not only join us in Chicago in May 2017 -- but also submit an exciting proposal.

We also overhauled the website, so please see R/Finance. It should render well and fast on devices of all sizes: phones, tablets, desktops with browsers in different resolutions. The program and registration details still correspond to last year's conference and will be updated in due course.

So read on below, and see you in Chicago in May!

Call for Papers

R/Finance 2017: Applied Finance with R
May 19 and 20, 2017
University of Illinois at Chicago, IL, USA

The ninth annual R/Finance conference for applied finance using R will be held on May 19 and 20, 2017 in Chicago, IL, USA at the University of Illinois at Chicago. The conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past eight years, R/Finance has included attendees from around the world. It has featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2017.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format) although more complete papers are preferred. We welcome submissions for both full talks and abbreviated "lightning talks." Both academic and practitioner proposals related to R are encouraged.

All slides will be made publicly available at conference time. Presenters are strongly encouraged to provide working R code to accompany the slides. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

Financial assistance for travel and accommodation may be available to presenters, however requests must be made at the time of submission. Assistance will be granted at the discretion of the conference committee.

Please submit proposals online at http://go.uic.edu/rfinsubmit.

Submissions will be reviewed and accepted on a rolling basis with a final deadline of February 28, 2017. Submitters will be notified via email by March 31, 2017 of acceptance, presentation length, and financial assistance (if requested).

Additional details will be announced via the conference website as they become available. Information on previous years' presenters and their presentations are also at the conference website. We will make a separate announcement when registration opens.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

/computers/R | permanent link

Thu, 15 Oct 2015

R / Finance 2016 Call for Papers

Earlier today, Josh sent the text below in this message to the R-SIG-Finance list as the very first heads-up concerning the 2016 edition of our successful R/Finance series.

We are once again very excited about our conference, thrilled about upcoming keynotes (some of which are confirmed and some of which are in the works), and hope that many R / Finance users will not only join us in Chicago in May 2016 -- but also submit an exciting proposal.

So read on below, and see you in Chicago in May!

Call for Papers

R/Finance 2016: Applied Finance with R
May 20 and 21, 2016
University of Illinois at Chicago, IL, USA

The eight annual R/Finance conference for applied finance using R will be held on May 20 and 21, 2016, in Chicago, IL, USA at the University of Illinois at Chicago. The conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past seven years, R/Finance has included attendees from around the world. It has featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2016.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format) although more complete papers are preferred. We welcome submissions for both full talks and abbreviated "lightning talks." Both academic and practitioner proposals related to R are encouraged.

All slides will be made publicly available at conference time. Presenters are strongly encouraged to provide working R code to accompany the slides. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

The conference will award two (or more) $1000 prizes for best papers. A submission must be a full paper to be eligible for a best paper award. Extended abstracts, even if a full paper is provided by conference time, are not eligible for a best paper award. Financial assistance for travel and accommodation may be available to presenters, however requests must be made at the time of submission. Assistance will be granted at the discretion of the conference committee.

Please make your submission online at this link. The submission deadline is January 29, 2016. Submitters will be notified via email by February 29, 2016 of acceptance, presentation length, and financial assistance (if requested).

Additional details will be announced via the R/Finance conference website as they become available. Information on previous years' presenters and their presentations are also at the conference website.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

/computers/R | permanent link

Tue, 31 Mar 2015

R / Finance 2015 Open for Registration

The annoucement below just went to the R-SIG-Finance list. More information is as usual at the R / Finance page.

Registration for R/Finance 2015 is now open!

The conference will take place on May 29 and 30, at UIC in Chicago. Building on the success of the previous conferences in 2009-2014, we expect more than 250 attendees from around the world. R users from industry, academia, and government will joining 30+ presenters covering all areas of finance with R.

We are very excited about the four keynote presentations given by Emanuel Derman, Louis Marascio, Alexander McNeil, and Rishi Narang.
The conference agenda (currently) includes 18 full presentations and 19 shorter "lightning talks". As in previous years, several (optional) pre-conference seminars are offered on Friday morning.

There is also an (optional) conference dinner at The Terrace at Trump Hotel. Overlooking the Chicago river and skyline, it is a perfect venue to continue conversations while dining and drinking.

Registration information and agenda details can be found on the conference website as they are being finalized.
Registration is also available directly at the registration page.

We would to thank our 2015 sponsors for the continued support enabling us to host such an exciting conference:

International Center for Futures and Derivatives at UIC

Revolution Analytics
MS-Computational Finance and Risk Management at University of Washington

Ketchum Trading
OneMarketData
RStudio
SYMMS

On behalf of the committee and sponsors, we look forward to seeing you in Chicago!

For the program committee:
Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

See you in Chicago in May!

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/R | permanent link

Mon, 24 Nov 2014

YATORP -- Yet Another Tutorial on R Packaging

What the world needs right now is yet another tutorial on R packages and their creation. Luckily, this last Friday and Saturday, I had the opportunity to present in a workshop organized by Frank DiTraglia at Penn's shiny new Warren Center, and held at Wharton.

Given the Warren Center's focus, the workshop centered around Big Data and Open Science with R. Yihui Xie and myself alternated on delivering four units on an Introduction to R, Writing R packages, Dynamic Documents with R, and HPC with Rcpp and RcppArmadillo.

So I had to come up with a plan for teaching creating R packages -- and decided to do it from the very bottom up, clearly introducing the underlying R CMD ... commands and only then switching to taking advantage of an environment such as the RStudio IDE.

The resulting slides are now available on my presentations page. The code examples are in a repo subdirectory on GitHub as well. While both were designed to support the parallel live instruction offered in the workshop, I would be interested in feedback (preferably via email) about how useful the slides are by themselves.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/R | permanent link

Tue, 18 Nov 2014

R / Finance 2015 Call for Papers

Earlier today, Josh send the text below to the R-SIG-Finance list, and I updated the R/Finance website, including its Call for Papers page, accordingly.

We are once again very excited about our conference, thrilled about the four confirmed keynotes, and hope that many R / Finance users will not only join us in Chicago in May 2015 -- but also submit an exciting proposal.

So read on below, and see you in Chicago in May!

Call for Papers:

R/Finance 2015: Applied Finance with R
May 29 and 30, 2015
University of Illinois at Chicago, IL, USA

The seventh annual R/Finance conference for applied finance using R will be held on May 29 and 30, 2015 in Chicago, IL, USA at the University of Illinois at Chicago. The conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past six years, R/Finance has included attendees from around the world. It has featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2015. This year will include invited keynote presentations by Emanuel Derman, Louis Marascio, Alexander McNeil, and Rishi Narang.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format) although more complete papers are preferred. We welcome submissions for both full talks and abbreviated "lightning talks." Both academic and practitioner proposals related to R are encouraged.

All slides will be made publicly available at conference time. Presenters are strongly encouraged to provide working R code to accompany the slides. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

The conference will award two (or more) $1000 prizes for best papers. A submission must be a full paper to be eligible for a best paper award. Extended abstracts, even if a full paper is provided by conference time, are not eligible for a best paper award. Financial assistance for travel and accommodation may be available to presenters, however requests must be made at the time of submission. Assistance will be granted at the discretion of the conference committee.

Please make your submission online at this link. The submission deadline is January 31, 2015. Submitters will be notified via email by February 28, 2015 of acceptance, presentation length, and financial assistance (if requested).

Additional details will be announced via the R/Finance conference website as they become available. Information on previous years' presenters and their presentations are also at the conference website.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson, Dale Rosenthal,
Jeffrey Ryan, Joshua Ulrich

/computers/R | permanent link

Thu, 25 Sep 2014

R and Docker

r and docker talk picture by @mediafly

Earlier this evening I gave a short talk about R and Docker at the September Meetup of the Docker Chicago group.

Thanks to Karl Grzeszczak for setting the meeting, and for providing a pretty thorough intro talk regarding CoreOS and Docker.

My slides are now up on my presentations page.

/computers/R | permanent link

Sat, 29 Mar 2014

R / Finance 2014 Open for Registration

The annoucement below just went to the R-SIG-Finance list. More information is as usual at the R / Finance page:

Now open for registrations:

R / Finance 2014: Applied Finance with R
May 16 and 17, 2014
Chicago, IL, USA

The registration for R/Finance 2014 -- which will take place May 16 and 17 in Chicago -- is now open!

Building on the success of the previous conferences in 2009, 2010, 2011, 2012 and 2013, we expect around 300 attendees from around the world. R users from industry, academia, and government will joining 30+ presenters covering all areas of finance with R.

We are very excited about the four keynotes by Bill Cleveland, Alexios Ghalanos, Bob McDonald and Luke Tierney. The main agenda (currently) includes sixteen full presentations and twenty-one shorter "lightning talks". We are also excited to offer four optional pre-conference seminars on Friday morning.

To celebrate the sixth year of the conference in style, the dinner will be returning to The Terrace of the Trump Hotel. Overlooking the Chicago River and skyline, it is a perfect venue to continue conversations while dining and drinking.

More details of the agenda are available at:
http://www.RinFinance.com/agenda/
Registration information is available at
http://www.RinFinance.com/register/
and can also be directly accessed by going to
http://www.regonline.com/RFinance2014
We would to thank our 2014 Sponsors for the continued support enabling us to host such an exciting conference:

International Center for Futures and Derivatives at UIC

Revolution Analytics
MS-Computational Finance at University of Washington

OneMarketData
RStudio

On behalf of the committee and sponsors, we look forward to seeing you in Chicago!
Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

See you in Chicago in May!

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/R | permanent link

Mon, 27 May 2013

R / Finance 2013 Recap -- and Presentation Slides

The fifth internation R/Finance conference was held last weekend. As one of the founding co-organizers, I may well be accussed of a little bias, but we think we once again pulled off a very nice and successful weekend-long event. Participants had kind words to say during the conference, and a few first posts have appeared such as Joe Rickert's post over at the REvo blog.

Some participants, myself included, had already posted on their personal websites (though had forgotten to mention it here). In any event, I just updated the website with links to the pdf (or ppt) slides of all presenters who shared their material with us. Supplemental material may be made available too at a later date.

We hope you find these slides useful. Please do spread the word about the R/Finance conference as we expect to have a sixth edition in May 2014---and we do look forward to receiving even more outstanding submissions. Dates, details, call for papers, etc will be forthcoming over the next few months.

/computers/R | permanent link

Fri, 29 Mar 2013

R / Finance 2013 Open for Registration

The annoucement below just went to the R-SIG-Finance list. More information is as usual at the R / Finance page:

Now open for registrations:

R / Finance 2013: Applied Finance with R
May 17 and 18, 2013
Chicago, IL, USA

The registration for R/Finance 2013 -- which will take place May 17 and 18 in Chicago -- is NOW OPEN!

Building on the success of the previous conferences in 2009, 2010, 2011 and 2012, we expect more than 250 attendees from around the world. R users from industry, academia, and government will joining 30+ presenters covering all areas of finance with R.

We are very excited about the four keynotes by Sanjiv Das, Attilio Meucci, Ryan Sheftel and Ruey Tsay. The main agenda (currently) includes seventeen full presentations and fifteen shorter "lightning talks". We are also excited to offer five optional pre-conference seminars on Friday morning.

To celebrate the fifth year of the conference in style, the dinner will be held at The Terrace of the Trump Hotel. Overlooking the Chicago river and skyline, it is a perfect venue to continue conversations while dining and drinking.

More details of the agenda are available at:

http://www.RinFinance.com/agenda/

Registration information is available at

http://www.RinFinance.com/register/
and can also be directly accessed by going to
http://www.regonline.com/RFinance2013
We would to thank our 2013 Sponsors for the continued support enabling us to host such an exciting conference:
International Center for Futures and Derivatives at UIC

Revolution Analytics
MS-Computational Finance at University of Washington

Google
lemnica
OpenGamma
OneMarketData
RStudio

On behalf of the committee and sponsors, we look forward to seeing you in Chicago!

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

See you in Chicago in May!!

/computers/R | permanent link

Mon, 17 Dec 2012

R / Finance 2013 Call for Papers

The text below just went out to r-sig-finance along with updates to the R/Finance website and its Call for Papers page.

Call for Papers:

R/Finance 2013: Applied Finance with R
May 17 and 18, 2013
University of Illinois, Chicago, IL, USA

The fifth annual R/Finance conference for applied finance using R will be held on May 17 and 18, 2013 in Chicago, IL, USA at the University of Illinois at Chicago. The conference is expected to cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past four years, R/Finance has included attendees from around the world. It featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2013.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format) although more complete papers are preferred. We welcome submissions for full talks and abbreviated "lightning talks". Both academic and practitioner proposals related to R are encouraged.

Presenters are strongly encouraged to provide working R code to accompany the presentation/paper. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

The conference will award two (or more) $1000 prizes for best papers. A submission must be a full paper to be eligible for a best paper award. Extended abstracts, even if a full paper is provided by conference time, are not eligible for a best paper award. Financial assistance for travel and accommodation may be available to presenters at the discretion of the conference committee. Requests for assistance should be made at the time of submission.

Please send submissions to: committee at RinFinance.com. The submission deadline is February 15, 2013. Submitters will be notified of acceptance via email by February 28, 2013. Notification of whether a presentation will be a long presentation or a lightning talk will also be made at that time.

Additional details will be announced at this website as they become available. Information on previous year's presenters and their presentations are also at the conference website.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

So see you in Chicago in May!

/computers/R | permanent link

Mon, 19 Mar 2012

R / Finance 2012 Open for Registration

The annoucement below just went to the R-SIG-Finance list. More information is as usual the the R / Finance page:

Now open for registrations:

R / Finance 2012: Applied Finance with R
May 11 and 12, 2012
Chicago, IL, USA

The registration for R/Finance 2012 -- which will take place May 11 and 12 in Chicago -- is NOW OPEN!

Building on the success of the three previous conferences in 2009, 2010, and 2011, we expect more than 250 attendees from around the world. R users from industry, academia, and government will join 40+ presenters covering all areas of finance with R.

This year's conference will start earlier in the day on Friday, to accommodate the tremendous line up of speakers for 2012, as well as to provide more time between talks for networking.

We are very excited about the four keynotes by Paul Gilbert, Blair Hull, Rob McCulloch, and Simon Urbanek. The main agenda includes nineteen full presentations and eighteen shorter "lightning talks". We are also excited to offer six optional pre-conference seminars on Friday morning.

Once again, we are hosting the R/Finance conference dinner on Friday evening, where you can continue conversations while dining and drinking atop a West Loop restaurant overlooking the Chicago skyline.

More details of the agenda are available at:

http://www.RinFinance.com/agenda/

Registration information is available at

http://www.RinFinance.com/register/
and can also be directly accessed by going to
http://www.regonline.com/RFinance2012

On behalf of the committee and sponsors, we look forward to seeing you in Chicago!

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich
Our 2012 Sponsors:
International Center for Futures and Derivatives at UIC

Revolution Analytics
Sybase
MS-Computational Finance at University of Washington

Google
lemnica
OpenGamma
OneTick
RStudio
Tick Data

See you in Chicago in May!!

/computers/R | permanent link

Thu, 15 Dec 2011

R / Finance 2012 Call for Papers

Last night, the text below went out to r-sig-finance along with updates to the R/Finance website and its Call for Papers page; followed by some tweeting and Goggle+'ing (and please do feel free to retweet and share at will...)

Call for Papers:

R/Finance 2012: Applied Finance with R
May 11 and 12, 2012
University of Illinois, Chicago, IL, USA

The fourth annual R/Finance conference for applied finance using R will be held on May 11 and 12, 2012 in Chicago, IL, USA on the campus of the University of Illinois at Chicago. The two-day conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past three years, R/Finance has included attendees from around the world and featured keynote presentations from prominent academics and practitioners. We anticipate another exciting line-up for 2012 --- including keynote presentations from Blair Hull, Paul Gilbert, Rob McCulloch, and Simon Urbanek.

We invite you to submit complete papers or one-page abstracts (in txt or pdf format) for consideration. Academic and practitioner proposals related to R are encouraged. We welcome submissions for full talks, abbreviated "lightning talks", and for a limited number of (longer) pre-conference seminar sessions.

Presenters are strongly encouraged to provide working R code to accompany the presentation/paper. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

Travel and accommodation grants may be available for selected presenters at the discretion of the committee. In addition, the conference will award prizes for best papers. To be eligible for a best paper award, a submission must be a full paper. Extended abstracts, even if a full paper by conference time, are not eligible for a best paper award.

Please send submissions to: committee at RinFinance.com.

The submission deadline is January 31, 2012. Submitters will be notified of acceptance via email by February 28, 2012. Notification of whether a presentation will be a long presentation or a lightning talk will also be made at that time.

Additional details will be announced at this website as they become available. Information on previous year's presenters and their presentations are also at the conference website R/Finance 2009, 2010 and 2011.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

So see you in Chicago in May!

Update: Corrected urls to past conference thanks to heads-up by Josh. Thanks!

/computers/R | permanent link

Wed, 25 May 2011

R / Finance 2011 presentations

I just sent the text below to the r-sig-finance list:
The organizing committee for the R/Finance 2011 conference is pleased to announce the availability of presentation slides from the 3rd annual R/Finance conference. This year's two-day conference once again attracted over 200 participants from across the globe. Academics, students and industry professionals enjoyed almost 30 talks covering trading, optimization, risk management and more --- all using R!

The majority of these presentations are now available for download at:

http://www.RinFinance.com/agenda/
This year we began offering prizes for the best paper submissions. The 2011 recipients are Robert Gramacy (University of Chicago) and David Matteson (Cornell University) who each won USD 1000. Also new was a graduate student travel award: Mikko Niemenmaa (Aalto University) and Clément Dunand-Châtellet (École Polytechnique) each received USD 500.

With this, the organizing committee would like to thank our lead conference sponsors, the International Center for Futures and Derivatives at UIC and Revolution Analytics, as well as our conference sponsors OneMarketData, RStudio and Lemnica for their continued support.

The organising committee would also like to thank all of the presenters and participants for making R/Finance 2011 so successful. We look forward to seeing you in 2012, with the prospective dates of May 17 - 19 to be confirmed.

For the organizing committee,

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

Enjoy!

/computers/R | permanent link

Tue, 12 Apr 2011

The new R compiler package in R 2.13.0: Some first experiments

R 2.13.0 will be released tomorrow. It contains a new package by R Core member Luke Tierney: compiler. What a simple and straightforward name, and something that Luke has been working on for several years. The NEWS file says
    o Package compiler is now provided as a standard package.  See
      ?compiler::compile for information on how to use the compiler.
      This package implements a byte code compiler for R: by default
      the compiler is not used in this release.  See the 'R
      Installation and Administration Manual' for how to compile the
      base and recommended packages.
so the compiler is not yet used for R's own base and recommended packages.

While working on my slides for the upcoming Rcpp workshop preceding R/Finance 2011, I thought of a nice test example to illustrate the compiler. Last summer and fall, Radford Neal had sent a very nice, long and detailed list of possible patches for R performance improvements to the development list. Some patches were integrated and some more discussion ensued. One strand was on the difference in parsing between normal parens and curly braces. In isolation, these seem too large, but (as I recall) this is due some things 'deep down' in the parser.

However, some folks really harped on this topic. And it just doesn't die as a post from last week demonstrated once more. Last year, Christian Robert had whittled it down to a set of functions, and I made a somewhat sarcastic post argueing that I'd rather use Rcpp to get 80-fold speed increase than spend my time argueing over ten percent changes in code that could be made faster so easily.

So let us now examine what the compiler package can do for us. The starting point is the same as last year: five variants of computing 1/(1+x) for a scalar x inside an explicit for loop. Real code would never do it this way as vectorisation comes to the rescue. But for (Christian's) argument's sake, it is useful to highlight differences in the parser. We once again use the nice rbenchmark package to run, time and summarise alternatives:

> ## cf http://dirk.eddelbuettel.com/blog/2010/09/07#straight_curly_or_compiled
> f <- function(n, x=1) for (i in 1:n) x=1/(1+x)
> g <- function(n, x=1) for (i in 1:n) x=(1/(1+x))
> h <- function(n, x=1) for (i in 1:n) x=(1+x)^(-1)
> j <- function(n, x=1) for (i in 1:n) x={1/{1+x}}
> k <- function(n, x=1) for (i in 1:n) x=1/{1+x}
> ## now load some tools
> library(rbenchmark)
> ## now run the benchmark
> N <- 1e6
> benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1),
+           columns=c("test", "replications",
+           "elapsed", "relative"),
+           order="relative", replications=10)
     test replications elapsed relative
5 k(N, 1)           10   9.764  1.00000
1 f(N, 1)           10   9.998  1.02397
4 j(N, 1)           10  11.019  1.12853
2 g(N, 1)           10  11.822  1.21077
3 h(N, 1)           10  14.560  1.49119
This replicates Christian's result. We find that function k() is the fastest using curlies, and that explicit exponentiation in function h() is the slowest with a relative penalty of 49%, or an absolute difference of almost five seconds between the 9.7 for the winner and 14.6 for the worst variant. On the other hand, function f(), the normal way of writing things, does pretty well.

So what happens when we throw the compiler into the mix? Let's first create compiled variants using the new cmpfun() function and then try again:

> ## R 2.13.0 brings this toy
> library(compiler)
> lf <- cmpfun(f)
> lg <- cmpfun(g)
> lh <- cmpfun(h)
> lj <- cmpfun(j)
> lk <- cmpfun(k)
> # now run the benchmark
> N <- 1e6
> benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1),
+           lf(N,1), lg(N,1), lh(N,1), lj(N,1), lk(N,1),
+           columns=c("test", "replications",
+           "elapsed", "relative"),
+           order="relative", replications=10)
       test replications elapsed relative
9  lj(N, 1)           10   2.971  1.00000
10 lk(N, 1)           10   2.980  1.00303
6  lf(N, 1)           10   2.998  1.00909
7  lg(N, 1)           10   3.007  1.01212
8  lh(N, 1)           10   4.024  1.35443
1   f(N, 1)           10   9.479  3.19051
5   k(N, 1)           10   9.526  3.20633
4   j(N, 1)           10  10.775  3.62673
2   g(N, 1)           10  11.299  3.80310
3   h(N, 1)           10  14.049  4.72871
Now things have gotten interesting and substantially faster, for very little cost. Usage is straightforward: take your function and compile it, and reap more than a threefold speed gain. Not bad at all. Also of note, the difference between the different expressions essentially vanishes. The explicit exponentiation is still the loser, but there may be an additional explicit function call involved.

So we do see the new compiler as a potentially very useful addition. I am sure more folks will jump on this and run more tests to find clearer corner cases. To finish, we have to of course once more go back to Rcpp for some real gains, and some comparison between compiled and byte code compiled code.

> ## now with Rcpp and C++
> library(inline)
> ## and define our version in C++
> src <- 'int n = as<int>(ns);
+         double x = as<double>(xs);
+         for (int i=0; i<n; i++) x=1/(1+x);
+         return wrap(x); '
> l <- cxxfunction(signature(ns="integer",
+                            xs="numeric"),
+                  body=src, plugin="Rcpp")
> ## now run the benchmark again
> benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1),
+           l(N,1),
+           lf(N,1), lg(N,1), lh(N,1), lj(N,1), lk(N,1),
+           columns=c("test", "replications",
+           "elapsed", "relative"),
+           order="relative", replications=10)
       test replications elapsed relative
6   l(N, 1)           10   0.120   1.0000
11 lk(N, 1)           10   2.961  24.6750
7  lf(N, 1)           10   3.128  26.0667
8  lg(N, 1)           10   3.140  26.1667
10 lj(N, 1)           10   3.161  26.3417
9  lh(N, 1)           10   4.212  35.1000
5   k(N, 1)           10   9.500  79.1667
1   f(N, 1)           10   9.621  80.1750
4   j(N, 1)           10  10.868  90.5667
2   g(N, 1)           10  11.409  95.0750
3   h(N, 1)           10  14.077 117.3083
Rcpp still shoots the lights out by a factor of 80 (or even almost 120 to the worst manual implementation) relative to interpreted code. Relative to the compiled byte code, the speed difference is about 25-fold. Now, these are of course entirely unrealistic code examples that are in no way, shape or form representative of real R work. Effective speed gains will be smaller for both the (pretty exciting new) compiler package and also for our C++ integration package Rcpp.

Before I close, two more public service announcements. First, if you use Ubuntu see this post by Michael on r-sig-debian announcing his implementation of a suggestion of mine: we now have R alpha/beta/rc builds via his Launchpad PPA. Last Friday, I had the current R-rc snapshot of R 2.13.0 on my Ubuntu box only about six hours after I (as Debian maintainer for R) uploaded the underlying new R-rc package build to Debian unstable. This will be nice for testing of upcoming releases. Second, as I mentioned, the Rcpp workshop on April 28 preceding R/Finance 2011 on April 29 and 30 still has a few slots available, as has the conference itself.

/computers/R | permanent link

Thu, 07 Apr 2011

Review of 'R in a Nutshell' by J Adler now in JSS

Volume 40 of of the (electronic) Journal of Statistical Software just came out. It contains my short review of Josef Adler's already pretty popular R in a Nutshell from O'Reilly. You can my review as a pdf via this local preprint, or by visiting the JSS page of the review and its official copy.

/computers/R | permanent link

Fri, 31 Dec 2010

R / Finance 2011 Call for Papers: Updated and expanded

One week ago, I sent the updated announcement below to the r-sig-finance list; this was kindly blogged about by fellow committee member Josh and by our pal Dave @ REvo. By now. I also updated the R / Finance conference website. So to round things off, a quick post here is in order as well. It may even get a few of the esteemed reader to make a New Year's resolution about submitting a paper :)

Dear R / Finance community,

The preparations for R/Finance 2011 are progressing, and due to favourable responses from the different sponsors we contacted, we are now able to offer

  1. a competition for best paper, which given the focus of the conference will award for both an 'academic' paper and an 'industry' paper
  2. availability of travel grants for up to two graduate students provided suitable papers were accepted for presentations

More details are below in the updated Call for Papers. Please feel free to re-circulate this Call for Papers with collegues, students and other associations.

Cheers, and Season's Greeting,

Dirk (on behalf of the organizing / program committee)

Call for Papers:

R/Finance 2011: Applied Finance with R

April 29 and 30, 2011
Chicago, IL, USA

The third annual R/Finance conference for applied finance using R will be held this spring in Chicago, IL, USA on April 29 and 30, 2011. The two-day conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Complete papers or one-page abstracts (in txt or pdf format) are invited to be submitted for consideration. Academic and practitioner proposals related to R are encouraged. We welcome submissions for full talks, abbreviated lightning talks, and for a limited number of pre-conference (longer) seminar sessions.

Presenters are strongly encouraged to provide working R code to accompany the presentation/paper. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

The conference will award two $1000 prizes for best paper: one for best practitioner-oriented paper and one for best academic-oriented paper. Further, to defray costs for graduate students, two travel and expense grants of up to $500 each will be awarded to graduate students whose papers are accepted. To be eligible, a submission must be a full paper; extended abstracts are not eligible.

Please send submissions to: committee at RinFinance.com

The submission deadline is February 15th, 2011. Early submissions may receive early acceptance and scheduling. The graduate student grant winners will be notified by February 23rd, 2011.

Submissions will be evaluated and submitters notified via email on a rolling basis. Determination of whether a presentation will be a long presentation or a lightning talk will be made once the full list of presenters is known.

R/Finance 2009 and 2010 included attendees from around the world and featured keynote presentations from prominent academics and practitioners. 2009-2010 presenters names and presentations are online at the conference website. We anticipate another exciting line-up for 2011---including keynote presentations from John Bollinger, Mebane Faber, Stefano Iacus, and Louis Kates. Additional details will be announced via the conference website as they become available.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

/computers/R | permanent link

Fri, 17 Dec 2010

Introduction to ESS: talk and slides

We had another meeting of the Chicago R User Group last evening. This was scheduled somewhat belatedly once we learned that Drew Conway would be in town. Drew gave a very nice talk about his brand new infochimps package (for accessing the eponymous infochimps data service and marketplace). Slides are available on Drew's blog. In fact, he had already blogged about his talk before I had even started to write my slides...

The user group meetings have a meme of showing how to use R with different editors, UIs, IDEs,... It started with a presentation on Eclipse and its StatET plugin. So a while ago I had offered to present on ESS, the wonderful Emacs mode for R (and as well as SAS, Stata, BUGS, JAGS, ...). And now I owe a big thanks to the ESS Core team for keeping all their documentation, talks, papers etc in their SVN archive, and particularly to Stephen Eglen for putting the source code to Tony Rossini's tutorial from useR! 2006 in Vienna there. This allowed me to quickly whip up a few slides though a good part of the presentation did involve a live demo missing from the slides. Again, big thanks to Tony for the old slides and to Stephen for making them accessible when I mentioned the idea of this talk a while back -- it allowed to put this together on short notice.

And for those going to useR! 2011 in Warwick next summer, Stephen will present a full three-hour ESS tutorial which will cover ESS in much more detail.

/computers/R | permanent link

Wed, 27 Oct 2010

Google Tech Talk on Integrating R and C++: video and slides

Last Friday, Romain and I were guests of the R intergrouplet (what an adorable name!) at Google's headquarter in Mountain View. This arose out of discussions following useR! 2010 where we met Google's Murray Stokely. There appears to be ever increasing use of R at Google, and so it was a great opportunity to give a Google Tech Talk about R and C++ integration --- centered around our Rcpp, RInside and RProtoBuf packages which facilitate interoperability between R and C++.

A video recording of our ninety-minute talk is already available via the YouTube channel for Google Tech Talks. The (large) pdf with slides (which Romain had already posted on slideshare) is also available from my presentations page.

The remainder of the weekend was nice too (with the notably exception of the extremly sucky weather). We got to to spend some time at the Google Summer of Code Mentor Summit which is always a fun event and a great way to meet other open source folks in person. And we also took one afternoon off to spend some with John Chambers discussing further work involving Rcpp and the new ReferenceClasses that appeared in the just-released R version 2.12.0. This should be a nice avenue to further integrate R and C++ in the near future.

/computers/R | permanent link

Fri, 24 Sep 2010

R / Finance 2011 Call for Papers

Brian announced it on r-help and r-sig-finance and I have since updated the R/Finance website and Call for Papers page. And as David Smith already outblogged me about it, without further ado our Call for Paper for next spring's R/Finance conference:

Call for Papers:

R/Finance 2011: Applied Finance with R
April 29 and 30, 2011
Chicago, IL, USA

The third annual R/Finance conference for applied finance using R will be held this spring in Chicago, IL, USA on April 29 and 30, 2011. The two-day conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

One-page abstracts or complete papers (in txt or pdf format) are invited to be submitted for consideration. Academic and practitioner proposals related to R are encouraged. We welcome submissions for full talks, abbreviated "lightning talks", and for a limited number of pre-conference (longer) seminar sessions.

Presenters are strongly encouraged to provide working R code to accompany the presentation/paper. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

Please send submissions to: committee at RinFinance.com.

The submission deadline is February 15th, 2011. Early submissions may receive early acceptance and scheduling.

Submissions will be evaluated and submitters notified via email on a rolling basis. Determination of whether a presentation will be a long presentation or a lightning talk will be made once the full list of presenters is known.

R/Finance 2009 and 2010 included attendees from around the world and featured keynote presentations from prominent academics and practitioners. 2009-2010 presenters names and presentations are online at the conference website. We anticipate another exciting line-up for 2011 including keynote presentations from John Bollinger, Mebane Faber, Stefano Iacus, and Louis Kates. Additional details will be announced via the conference website as they become available.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

So see you in Chicago in April!

/computers/R | permanent link

Tue, 07 Sep 2010

Straight, curly, or compiled?

Christian Robert, whose blog I commented-on here once before, had followed up on a recent set of posts by Radford Neal which had appeared both on Radford's blog and on the r-devel mailing list.

Now, let me prefix this by saying that I really enjoyed Radford's posts. He obviously put a lot of time into finding a number of (all somewhat small in isolation) inefficiencies in R which, when taken together, can make a difference in performance. I already spotted one commit by Duncan in the SVN logs for R so this is being looked at.

Yet Christian, on the other hand, goes a little overboard in bemoaning performance differences somewhere between ten and fifteen percent -- the difference between curly and straight braces (as noticed in Radford's first post). Maybe he spent too much time waiting for his MCMC runs to finish to realize the obvious: compiled code is evidently much faster.

And before everybody goes and moans and groans that that is hard, allow me to just interject and note that it is not. It really doesn't have to be. Here is a quick cleaned up version of Christian's example code, with proper assigment operators and a second variable x. We then get to the meat and potatoes and load our Rcpp package as well as inline to define the same little test function in C++. Throw in rbenchmark which I am becoming increasingly fond of for these little timing tests, et voila, we have ourselves a horserace:

# Xian's code, using <- for assignments and passing x down
f <- function(n, x=1) for (i in 1:n) x=1/(1+x)
g <- function(n, x=1) for (i in 1:n) x=(1/(1+x))
h <- function(n, x=1) for (i in 1:n) x=(1+x)^(-1)
j <- function(n, x=1) for (i in 1:n) x={1/{1+x}}
k <- function(n, x=1) for (i in 1:n) x=1/{1+x}

# now load some tools
library(Rcpp)
library(inline)

# and define our version in C++
l <- cxxfunction(signature(ns="integer", xs="numeric"),
                 'int n = as<int>(ns); double x=as<double>(xs);
                  for (int i=0; i<n; i++) x=1/(1+x);
                  return wrap(x); ',
                 plugin="Rcpp")

# more tools
library(rbenchmark)

# now run the benchmark
N <- 1e6
benchmark(f(N, 1), g(N, 1), h(N, 1), j(N, 1), k(N, 1), l(N, 1),
          columns=c("test", "replications", "elapsed", "relative"),
          order="relative", replications=10)

And how does it do? Well, glad you asked. On my i7, which the other three cores standing around and watching, we get an eighty-fold increase relative to the best interpreted version:

/tmp$ Rscript xian.R
Loading required package: methods
     test replications elapsed relative
6 l(N, 1)           10   0.122    1.000
5 k(N, 1)           10   9.880   80.984
1 f(N, 1)           10   9.978   81.787
4 j(N, 1)           10  11.293   92.566
2 g(N, 1)           10  12.027   98.582
3 h(N, 1)           10  15.372  126.000
/tmp$ 
So do we really want to spend time arguing about the ten and fifteen percent differences? Moore's law gets you those gains in a couple of weeks anyway. I'd much rather have a conversation about how we can get people speed increases that are orders of magnitude, not fractions. Rcpp is one such tool. Let's get more of them.

/computers/R | permanent link

Sat, 24 Jul 2010

useR 2010 at NIST in Gaithersburg

This past week, the annual R user conference useR! 2010 took place at the National Institute of Standards and Technology (NIST) in Gaithersburg, MD (which is a tad northwest of Washington, DC). Kate Mullen and her team of local organizers did a truly tremendous job in putting together a very smooth conference attended by almost 500 people. It is always nice to meet so many other R contributors and users in person. And needless to say it's also just plain fun to hang out with these folks.

As at the preceding useR! 2008 in Dortmund and useR! 2009 in Rennes, I presented a three-hour tutorial on high-performance computing with R. This covers scripting/automation, profiling, vectorisation, interfacing compiled code, parallel computing and large-memory approaches. The slides, as well as a condensed 2-up version, are now on my presentations page.

On Wednesday, Romain and I had a chance to talk about recent work on Rcpp, our R and C++ integration. Thursday, we followed up with a presentation on RProtoBuf -- a project integrating Google's Protocol Buffers with R which much to our delight already seems to be in use at Google itself! It was quite fun to do these two talks jointly with Romain. But my other coauthor Khanh had to be at a conference related to his actual PhD work. So on Friday it was just me to give a presentation about RQuantLib which brings QuantLib to R.

Slides from all these talks have now been added to my presentations page. I will also upload them via the conference form so that they can be part of the conference's collection of presentations which should be forthcoming.

/computers/R | permanent link

Thu, 27 May 2010

WU Wien presentations

Last week I had the opportunity to spend a few days at the Institute for Statistics and Mathematics of the WU Vienna / Wirtschaftsuniversitaet Wien. On Thursday, I gave a seminar on Rcpp and RInside introducing all the recent work with Romain on making R and C++ integration easier. Both (compact) handout and (full) presentation slides are now posted alongside the other presentations.

On Friday, I also gave an informal lecture / tutorial / workshop to some of the Stats and Finance Ph.D. students, drawing largely from the section on parallel computing of the most recent Introduction to High-Performance Computing with R tutorial.

My sincere thanks to Kurt Hornik and Stefan Theussl for the invite -- it was a great trip, notwithstanding the mostly unseasonally cold and wet weather.

/computers/R | permanent link

Tue, 20 Apr 2010

R / Finance 2010 presentations

Last Friday and Saturday the second R / Finance conference took place in Chicago on the UIC campus.

As a co-organizer, it was a great pleasure to see so many users of R in Finance---from both industry and academia---come to Chicago to discuss and share recent work. There is a lot going on, and it is always good to exchange ideas with others sharing the same infrastructure. Participants appeared to enjoy the conference. My thanks to everybody who helped to put it together, from the local committee to the helping hands at UIC and of course the sponsors.

I just put my slides from the Extending and Embedding R with C++ tutorial preceding the conference, as well as the RQuantLib: Interfacing QuantLin from R presentation (with Khanh Nguyen), up onto my presentations page. I do have a usb-drive with all conference presentations and will provide them via the R / Finance site in a few days.

The only truly sour note is the fact that several presenters from Europe had their travels schedules turned upside down by the disruption to international air travel caused by the Icelandic volcano eruption and the resulting ash clouds. While we are glad to have had them for a little longer in Chicago, we understand that they are getting eager to return home. I hope this extended stay in the Windy City does not take away from the overall usefulness of the trip.

/computers/R | permanent link

Wed, 07 Apr 2010

Video of UCLA / LA RUG talk on R and C++ integration

Thanks to the efforts of the tireless R User Group organizers Szilard Pafka (in Los Angeles, recording the talk) and Drew Conway (in New York, converting and organising hosting), there is now a video and slide combo of my recent talk about Rcpp and RInside at UCLA and the Los Angeles R Users Group.

Thanks also to David Smith (at the REvolutions blog) and Drew Conway (at his blog) for spreading the word about the presentation video and slides -- quite a few folks have come to my presentations page to get them.

/computers/R | permanent link

Sun, 04 Apr 2010

UCLA and LA RUG talks on R and C++ integration

We spent last week in the LA area and had a generally good time out west. I was able to sneak in two talks and a group discussion, thanks to the help by Jan de Leeuw (and everybody at UCLA's Stats department) as well as by Szilard Pafka representing the LA R User's Group. Pdf files for the slides for the talks are now on my presentations page in both a compact handout and presentation slide version (where the content is identical; if in doubt use the first file).

The talks centered around R and C++ integration using both Rcpp and RInside and summarise where both projects stand after all the recent work Romain and I put in over the last few months. The presentations went fairly well; I received some favourable comments.

Szilard and the R User Group had also suggested a group discussion about CRAN, its growth and how to maximise its usefulness. Given my CRANberries feed, my work on the CRAN Task Views for Empirical Finance and High-Performance Computing with R as well as our cran2deb binary package generator, I had some views and ideas that helped frame the discussion which turned out to very useful and informed. So maybe we should do this User Group thing in Chicago too!

Special thanks to Jan de Leeuw and Szilard Pafka for organising the meeting, talks and discussion.

/computers/R | permanent link

Thu, 25 Feb 2010

R and Sudoku solvers: Plus ca change...

Christian Robert blogged about a particularly heavy-handed solution to last Sunday's Sudoku puzzle in Le Monde. That had my symapthy as I like evolutionary computing methods, and his chart is rather pretty. From there, this spread on to the REvolutions blogs where David Smith riffed on it, and showed the acual puzzle. That didn't stop things as Christian blogged once more about it, this time welcoming his post-doc Robin Ryder who posts a heavy analysis on all this that is a little much for me at this time of day.

But what everybody seems to be forgetting is that R has had a Sudoku solver for years, thanks to the sudoku package by David Brahm and Greg Snow which was first posted four years ago. What comes around, goes around.

With that, and about one minute of Emacs editing to get the Le Monde puzzle into the required ascii-art form, all we need to do is this:

R> library(sudoku)
R> s <- readSudoku("/tmp/sudoku.txt")
R> s
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,]    8    0    0    0    0    1    2    0    0
 [2,]    0    7    5    0    0    0    0    0    0
 [3,]    0    0    0    0    5    0    0    6    4
 [4,]    0    0    7    0    0    0    0    0    6
 [5,]    9    0    0    7    0    0    0    0    0
 [6,]    5    2    0    0    0    9    0    4    7
 [7,]    2    3    1    0    0    0    0    0    0
 [8,]    0    0    6    0    2    0    1    0    9
 [9,]    0    0    0    0    0    0    0    0    0
R> system.time(solveSudoku(s))
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,]    8    4    9    6    7    1    2    5    3
 [2,]    6    7    5    2    4    3    9    1    8
 [3,]    3    1    2    9    5    8    7    6    4
 [4,]    1    8    7    4    3    2    5    9    6
 [5,]    9    6    4    7    8    5    3    2    1
 [6,]    5    2    3    1    6    9    8    4    7
 [7,]    2    3    1    8    9    4    6    7    5
 [8,]    4    5    6    3    2    7    1    8    9
 [9,]    7    9    8    5    1    6    4    3    2
   user  system elapsed
  5.288   0.004   5.951
R>
That took all of five seconds while my computer was also compiling a particularly resource-hungry C++ package....

Just in case we needed another illustration that it is hard to navigate the riches and wonders that is CRAN...

/computers/R | permanent link

Thu, 18 Feb 2010

U of C ACM talk

Fellow GSoC mentor and local ACM masterminder Borja Sotomayor had invited me a few months ago to give a talk at the ACM chapter at the University of Chicago. Today was the day, and the slides from the 50-minutes talk on R and extending R with Rcpp are now on my presentations page.

/computers/R | permanent link

Fri, 05 Feb 2010

R / Finance 2010 Open for Registration

The annoucement below went out to R-SIG-Finance earlier today. For information is as usual the the R / Finance 2010 page:

Now open for registrations:

R / Finance 2010: Applied Finance with R
April 16 and 17, 2010
Chicago, IL, USA

The second annual R / Finance conference for applied finance using R, the premier free software system for statistical computation and graphics, will be held this spring in Chicago, IL, USA on Friday April 16 and Saturday April 17.

Building on the success of the inaugural R / Finance 2009 event, this two-day conference will cover topics as diverse as portfolio theory, time-series analysis, as well as advanced risk tools, high-performance computing, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management and trading.

Invited keynote presentations by Bernhard Pfaff, Ralph Vince, Mark Wildi and Achim Zeileis are complemented by over twenty talks (both full-length and 'lightning') selected from the submissions. Four optional tutorials are also offered on Friday April 16.

R / Finance 2010 is organized by a local group of R package authors and community contributors, and hosted by the International Center for Futures and Derivatives (ICFD) at the University of Illinois at Chicago.

Conference registration is now open. Special advanced registration pricing is available, as well as discounted pricing for academic and student registrations.

More details and registration information can be found at the website at
http://www.RinFinance.com

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, John Miller,
Brian Peterson, Dale Rosenthal, Jeffrey Ryan

See you in Chicago in April!

/computers/R | permanent link

Thu, 07 Jan 2010

Review of 'Computational Statistics: An Introduction to R' in JSS

Somehow missed during the the end-of-year switchover was the fact that my review of Guenther Sawitzki's Computational Statistics: An Introduction to R (CRC / Chapman \& Hall, 2009) is now up on the Journal of Statistical Software website.

/computers/R | permanent link

Tue, 01 Dec 2009

Updated slides for 'Introduction to HPC with R' (now with correct URLs)

This is an updated version of yesterday's post with corrected URLs -- by copy-and-pasting I had still referenced the previous slides from UseR! 2009 in Rennes instead of last Friday's slides from the ISM presentation in Tokyo. The presentations page had the correct URLs, and this has been corrected below for this re-post. My apologies!

As mentioned yesterday, I spent a few days last week in Japan as I had an opportunity to present the Introduction to High-Performance Computing with R tutorial at the Institute for Statistical Mathematics in Tachikawa near Tokyo thanks to an invitation by Junji Nakano.

An updated version of the presentations slides (with a few typos corrected) is now available as is a 2-up handout version. Compared to previous versions, and reflecting the fact that this was the 'all-day variant' of almost five hours of lectures, the following changes were made:

  • the 'parallel computing' section was expanded further with discussion of the recent R packages multicore, iterators, foreach, doNWS, doSNOW, doMPI;
  • a first discussion of GPU computing using the gputools package was added;
  • the section on 'out of memory computing' using ff, bigmemory and biglm (including an example borrowed from Jay Emerson) reappeared in this longer version;
  • minor fixes and polishing throughout.

Comments and suggestions are, as always, appreciated.

/computers/R | permanent link

Mon, 09 Nov 2009

R / Finance 2010 Call for Papers

Jeff sent the following while I had connectivity issues and I hadn't gotten around to posting it here.

So without further ado, and given the success of our initial R / Finance 2009 conference about R in Finance, here is the call for papers for next spring:

Call for Papers:

R/Finance 2010: Applied Finance with R
April 16 and 17, 2010
Chicago, IL, USA

The second annual R/Finance conference for applied finance using R will be held this spring in Chicago, IL, USA on April 16 and 17, 2010. The two-day conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management and trading.

One-page abstracts or complete papers (in txt or pdf format) are invited for consideration. Academic and practitioner research proposals related to R are encouraged. We will accept submissions for full talks, abbreviated "lightning talks", and a limited number of pre-conference tutorial sessions. Please indicate with your submission if you would be willing to produce a formal paper (10-15 pages) for a peer-reviewed conference proceedings publication.

Presenters are strongly encouraged to provide working R code to accompany the presentation/paper. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

Please send submissions to: committee at RinFinance.com

The submission deadline is December 31st, 2009.

Submissions will be evaluated and submitters notified via email on a rolling basis. Determination of whether a presentation will be a long presentation or a lightning talk will be made once the full list of presenters is known.

R/Finance 2009 included keynote presentations by Patrick Burns, Robert Grossman, David Kane, Roger Koenker, David Ruppert, Diethelm Wuertz, and Eric Zivot. Attendees included practitioners, academics, and government officials. We anticipate another exciting line-up for 2010 and will announce details at the conference website http://www.RinFinance.com as they become available.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, John Miller,
Brian Peterson, Dale Rosenthal, Jeffrey Ryan

See you in Chicago in April!

/computers/R | permanent link

Sat, 10 Oct 2009

R for system administration and scripting

On several occassions, R had suggested itself as a language for systems scripting. By this I mean random little adminstrative task such as (re-)moving or maybe renaming files or directories and the like.

One of such cases just happened a few minutes ago. The aforementioned Garmin Forerunner 405 can cooperate quite nicely with Linux using the gant reader for the ant wireless communication protocol between the usb hardware dongle and the Garmin 405. (Sources for gant are both this file and this git archive.) I had meant to blog about this tool and the resulting files one of these days anyway, but today I just want to mention that the default filenames created by the program were somewhat horrid such as 20.09.2009 101112.TCX to denote the 20th of September of this year at 10:11h and 12 seconds. As we all know, filenames with spaces are bad for the environment as well as plain annoying. So I had made the simple change in the C sources to switch to a saner format such as 20090920-101112.TCX (and I see that the git archive now contains a similar fix). But that still left me with some 80+ files with the dreaded names.

There are of course many ways to skin this cat and to rename the files in bulk. However, I found the following four lines to be fairly succinct

#!/usr/bin/r
files <- dir(".", pattern=".*\\.TCX$")
res <- lapply(files, function(f) {
    pt <- strptime(f, "%d.%m.%Y %H%M%S.TCX")  # parsed time
    ft <- strftime(pt, "%Y%m%d-%H%M%S.TCX")   # formatted time
    file.rename(f, ft)
})
as they show, among other things,
  • the access to one of the three (soon four) regexp engines, here as a simple patterns argument to dir()
  • the functional programming nature of the beast: files is a vector of filenames, and lapply() unrolls the vector one-by-one calling the anonymous function and passing the current element off as f
  • computing on times is particularly easy as we get strptime and strftime as any self- and POSIX-respecting language should
  • similarly, we get access to file system-level operations natively avoiding all quoting issues that make files with spaces such fun in the first place.
  • the littler scripting frontend providing /usr/bin/r rules.
So about five lines and two minutes later, some eighty-ish files were renamed and sanity was restored. Hm, and I took me five times as long to blog this.

Lastly, I do not mean to imply that Python or Perl or Ruby or (insert favourite tool here) cannot do it equally well. I simply meant to say that programmatically creating new filenames is definitely easier in R than it would have been in shell. And as an added bonus, we even get fully parsed time objects that I could have tested for. But then tests and documentation never get written on a Saturday.

/computers/R | permanent link

Tue, 04 Aug 2009

Mon, 13 Jul 2009

cran2deb: Would you like 1700+ new Debian / R packages ?

As I mentioned in my quick write-up of UseR 2009, one of my talks was about cran2deb: a system to turn (essentially) all CRAN packages into directly apt-get-able binary packages.

This is essentially a '2.0' version of earlier work with Steffen Moeller and David Vernazobres which we had presented in 2007. Then, the approach was top-down and monolithic which started to show its limits. This time, the idea was to borrow the successful bottom-up approach of my CRANberries feed.

The bulk of the work was done by Charles Blundell as part of his Google Summer of Code 2008 project which I had suggested and mentored. After that project had concluded, we both felt we should continue with it and bring it to 'production'. The CRAN hosts provided us with a (virtual Xen) machine to build on, and we are now ready to more publically announce the availability of the repositories for i386 and amd64:

  deb http://debian.cran.r-project.org/cran2deb/debian-i386 testing/
and
  deb http://debian.cran.r-project.org/cran2deb/debian-amd64 testing/

A few more details are provided in our presentation slides. We look forward to hearing from folks using; the r-sig-debian list may be a good venue for this.

/computers/R | permanent link

Sun, 12 Jul 2009

useR 2009 in Rennes: Recap and slides

I spent most of last week in Rennes, the capital of Brittany in France, as it was time for UseR! 2009, the annual R conference. Francois Husson, Aline Legrand and others at the Agrocampus Ouest had put together a really well-run conference, and it was a pleasure to reconnect with so many people. It was also pretty nice to walk around town and see bits and pieces of the Tombees de la nuit festival which happened at the same time.

As last year (and again at the BoC in December), I presented a three-hour tutorial on high-performance computing with R. This covers profiling, vectorisation, interfacing compiled code, debugging, parallel computing, as well as scripting and automation. Slides, and a 2-up version, are now on my presentations page.

I also gave two regular conference presentations. The first was on my Rcpp and RInside packages which facilitate interfacing R and C++. The second talk, based on joint work with Charles Blundell, describes our cran2deb system for creating Debian packages of essentially all CRAN packages. I will try to follow up on this with another post. Slides from these talks are also on my presentations page.

/computers/R | permanent link

Sat, 27 Jun 2009

R 2.9.1, CRANberries outage, and missing Java support

Just a short note that version 2.9.1 of R was released yesterday. And a corresponding Debian release went out as usual on the same day. One sour note: as the Java toolchain is currently broken, I had to disable compile-time support for Java. Just run R CMD javareconf once installed if you need it.

Speaking of broken, I had neither noticed that this R version now returns an additional field (for the repository) in the per-package metadata via available.packages(), nor that this change had broken my oh-so-useful and increasingly popular CRANberrries html and rss summaries of CRAN changes. So with the usual beta and rc releases or R 2.9.1 in Debian starting a week prior, CRANberries had been silent for six days from Friday the 21st to last Thursday. I rectified it once I noticed, and changed the code to no longer fall on its nose at that spot. Sorry for the few days without service.

/computers/R | permanent link

Wed, 29 Apr 2009

Slides from most recent R and HPC tutorial

A little earlier I put the slides from my Introduction to High-Performance Computing with R tutorial at the R / Finance conference last week onto my talks / presentation page. Other tutorials, talks and keynotes are also being posted on the conference program page.

This tutorial was a shorter format of just an hour which did not allow for any parallel computing with R. However, parallel computing with R via MPI, snow, nws, ... is covered in the slides from December's workshop at the BoC.

/computers/R | permanent link

Mon, 27 Apr 2009

Review of 'Analysis of Integrated and Cointegrated Time Series with R (2nd ed)' in JSS

A few weeks ago I wrote up a short review of Bernhard Pfaff's nice (but somewhat dry) Analysis of Integrated and Cointegrated Time Series with R (2nd ed) on unit root and cointegration modeling with R. This is now online at the Journal of Statistical Software.

/computers/R | permanent link

Sun, 26 Apr 2009

R / Finance 2009

Our inaugural R / Finance conference, mentioned here twice is now over.

We were fortunate to get seven outstanding invited keynote speakers, as well as eleven excellent presentations. This was preceded by four short tutorials (and I'll post slides from my Introduction to High-Performance Computing with R soon). With about 150 registered participants, plus keynoters, presenters, committee members, representatives from the sponsors (a quick shout of Thanks! to them), some folks from UIC (especially Holly without whom few things would have happened), we were probably around 200 people gathered at UIC. And then there was an extended social program at Jaks which is rather appropriate as we had numerous important committee meetings there over the preceding months. All in all it seems like a successful event. We may even do it again.

/computers/R | permanent link

Thu, 05 Mar 2009

Short introduction to R in Finance

Adam Gehr of DePaul University's Finance Department had organized a panel session about R in Finance at the Midwest Finance Association's 58th Annual Meeting which is happening this week here in Chicago.

I just posted my slides on my presentations page. The slides give a brief overview of R, the CRAN network and the by now over 1600 packages, mention the Finance Task View, briefly present four different packages (or package sets) and of course beat the drum for our upcoming R/Finance conference that will take place here in Chicago at the end of next month.

/computers/R | permanent link

Wed, 25 Feb 2009

Review of 'Applied Econometrics in R' in JSS

A short review of Kleiber and Zeileis' excellent Applied Econometrics with R is now out at the (online) Journal of Statistical Software.

/computers/R | permanent link

Mon, 23 Feb 2009

R/Finance conference in Chicago in April: Registration now open

Regarding the aforementioned R/Finance conference that will take place at the end of April here in Chicago, we announced earlier today that the conference website is now available. It provides information about the program, speakers and other details as well as a link to registration details.

See you in Chicago in April!

/computers/R | permanent link

Tue, 03 Feb 2009

Correct Datetime / POSIXct behaviour for R and kdb+

We have started to look into kdb+ as a possible high-performance column-store backend. Kx offers free trials -- and so I have played with this for a day or two, both the general system, data loads and dumps and in particular with the interface to R, Based on the few files (one C source with interface code, one R file to access the C code, one object file to link against, one header file and a simple Makefile), it took just a couple of minutes to turn this into a proper CRAN-style R package.

Anyway, the reason for this post was that the R / kdb+ glue code works well ... but not for datetimes. I really like to be able to pass date/time objects natively between systems as easily as, say, numbers or strings (and see e.g. my Rcpp package for doing this with R and C++) and I was a bit annoyed when the millisecond timestamps didn't move smoothly. Turns out that the basic converter function in the code had a number of problems: it converted to integer, only covered a single scalar rather than vectorised mode, and erroneously reduced a reference count. A better version, in my view, is as follows:

static SEXP from_datetime_kobject(K x) 
{
	SEXP result;
	int i, length = x->n;
	if (scalar(x)) {
		result = PROTECT(allocVector(REALSXP, 1));
		REAL(result)[0] = (kF(x)[0] + 10957) * 86400;
	} else {
		result = PROTECT(allocVector(REALSXP, length));
		for(i = 0; i < length; i++) {
		    	REAL(result)[i] = (kF(x)[i] + 10957) * 86400;
		}
	}
	SEXP datetimeclass = PROTECT(allocVector(STRSXP,2));
	SET_STRING_ELT(datetimeclass, 0, mkChar("POSIXt"));
	SET_STRING_ELT(datetimeclass, 1, mkChar("POSIXct"));
	setAttrib(result, R_ClassSymbol, datetimeclass);
	UNPROTECT(2); 
        return result; 
}
This deals with vectors as well as scalars, converts Kdb's 'fractional days since Jan 1, 2000' to the Unix standard of seconds since the epoch -- including the R extension of fractional seconds -- and as importantly, sets the class attributes to POSIXt POSIXct as needed by R. With that, a simple select max datetime from table does just that, and vectors of timestamped records of trades or quotes or whatever also come with proper POSIXct behaviour into R. Note that it needs TZ to be set to UTC, though, or you get a timezone offset you may not want.

/computers/R | permanent link

Fri, 30 Jan 2009

State-of-the-art in parallel computing with R: New paper

A few weeks ago, we finished a paper that surveys the current state of parallel computing with R. The paper was lead by Markus Schmidberger and written while he was visiting the Fred Hutchinson Cancer Research Center in Seattle. The co-authors are Martin Morgan, myself, Hao Yu, Luke Tierney and Ulrich Mansmann. The paper is now available as a technical report from LMU Munich via open access, and also from my papers page.

/computers/R | permanent link

Sat, 24 Jan 2009

New CRAN Task View on HPC

A while back, I suggested to Achim to add a new CRAN Task View for High Performance Computing with R. And as of a day or two ago, we now have the new CRAN Task View for High Performance Computing with R providing an overview about available packages, grouped thematically, with a focus on the various parallel computing application. I have already received a few great comments that even lead to an entire new section on applications. Keep'em coming!

/computers/R | permanent link

Wed, 07 Jan 2009

R featured in New York Times article

Today's New York Times carries a decent article about R. Predictably, this lead to one (short), two (longest), three (short) threads on the main R mailing list. One aspect merits further highlighting. The reporter asked whether R would pose a threat to SAS:
"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."

That's silly on so many levels. A concise and rather appropriate follow-up came in early from Frank Harrell, a long-time S and R advocate:

This is great to see. It's interesting that SAS Institute feels that non-peer-reviewed software with hidden implementations of analytic methods that cannot be reproduced by others should be trusted when building aircraft engines.

Achim already added this (and two more posts from the aforementioned threads) to the fortunes package that collects such choice quotes.

R in Finance (the topic of our upcoming conference) gets mentioned as well. Now, as editor of the Finance task view, I find that second half of

The financial services community has demonstrated a particular affinity for R; dozens of packages exist for derivatives analysis alone.
to be a little off the mark. But that's minor as the article is broadly sympathetic, and mostly "gets it" where it matters. Recommended.

/computers/R | permanent link

Thu, 01 Jan 2009

R/Finance conference in Chicago in April: Call for Papers

The following went out to the R-announce and R-SIG-Finance mailing lists a few days ago. The conference already has a very strong lineup of invited speakers, and we are now asking R / Finance users from both academia and industry to submit suitable one-page abstracts:

Call for Papers

The Finance Department of the University of Illinois at Chicago (UIC),
the International Center for Futures and Derivatives at UIC, and
members of the R finance community are pleased to announce

R/Finance 2009: Applied Finance with R

on April 24 and 25, 2009, in Chicago, IL, USA

Confirmed keynote speakers include:

Patrick Burns (Burns Statistics)
David Kane (Kane Capital)
Roger Koenker (U of Illinois at Urbana/Champaign)
David Ruppert (Cornell)
Diethelm Wuertz (ETH Zuerich)
Eric Zivot (U of Washington)
We invite all users of R in Finance to submit one-page abstracts or
complete papers (in txt/pdf/doc format). We encourage papers both on
academic research topics and related to use of R by Finance practitioners.

Presenters are strongly encouraged to provide working R code to accompany
the presentation/paper. Datasets need not be made public.

Please send submissions to committee@RinFinance.com.
The submission deadline is January 31st, 2009.
Submissions will be evaluated and submitters notified via email on a rolling basis.

Additional details about the conference will be announced as available.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, John Miller,
Brian Peterson, Dale Rosenthal, Jeffrey Ryan
See you in Chicago in April!

/computers/R | permanent link

Tue, 23 Dec 2008

Updated 'Introduction to High-Performance Computing with R'

Fellow R user Paul Gilbert had invited me to come to Ottawa and the Bank of Canada to give a presentation/workshop on 'high-performance computing with R' similar to the UseR 2008 tutorial and talk.

I just posted the updated slides from this talk, and there is also an updated live cdrom on the Alioth server. Also, it looks like the tutorial will be held again at UseR 2009 in Rennes, see here for a brief synopsis.

It was nice to get back to Canada, even if it was a 24 hour whirlwind trip. Ottaws looked quite pretty in all the snow. And it seems that I got rather lucky with the travel dates as both the days before and after my trip had a large number of flight cancellations and delays due to snow storms.

/computers/R | permanent link

Mon, 01 Dec 2008

CRANberries prettified

Judging from my html logs, a fair number of folks go to the html version of my CRANberries feed (which was originally announced here) of new or updated packages for R,

So I quickly put together some simple css formatting to make it look a little better than the default blosxom theme it sported previously. That said, you probably should read the rss version (more about rss here) anyway!

Update: Oops. And it even works with a correct path to the css file. Now fixed.

/computers/R | permanent link

Tue, 19 Aug 2008

UseR! 2008 talk

Besides the slides from the tutorial at UseR! 2008 that were mentioned here previously, I also gave a short talk on scripting with R in high-performance computing using our littler frontend to R.

The talk introduces and extends an example related to some of the material from the tutorial itself. The slides from the talk are a little rough as the talk was somewhat ad-hoc: As session chair, I was confronted with a fairly last-minute cancellation and a 15 minute hole, and thought this would make a good little talk. It does show a nice trick for using littler with Open MPI (via snow) under the powerful slurm resource manager and batch/queue engine.

/computers/R | permanent link

Tue, 12 Aug 2008

UseR! 2008 tutorial

Earlier today, I presented a 3 1/2 hour tutorial Introduction to high-performance R (here is a brief description of the talk) at the UseR! 2008 conference at the TU Dortmund.

In a nutshell, the tutorial covered how to measure / profile R performance for speed and memory use, how to accelerate R using vectorised expression and tools like Ra / jit, how to add compiled code to R using either the .C or .Call interface and using the inline and RCpp packages, how to use R code in parallel (explicitly using NWS, Rmpi or snow as well as implicitly using pnmath / OpenMP), and how to script / automate R using littler, Rscript or RPy.

The final version of the slides is now available via my presentations page, and the live cdrom with software support for all the software used is at Alioth.

Update: Corrected link to presentations page thanks to heads-up by Charles. Thanks!

/computers/R | permanent link

Sat, 16 Feb 2008

CRANberries updated

The good folks at CRAN, the R package network which by now contains over thirteen hundred packages, have reorganised the website slightly. As far as I can tell, the changes are all for the better: nicer URLs, lots more per-package information, and other changes such as an updated 'task view' format (which is the part I knew via my maintenance of the Finance task view).

But these changes also affected my my CRANberries (see the html or better yet rss view) summaries of new packages as some of the source information moved. So I just updated the (surprisingly short at 189 lines including plenty of whitespace and comments) script, and things should work now come the next update.

While updating the 'more info' link for new and updates posts to point to the new-style entry at CRAN, I also took the opportunity to update the format of the `blog' entry for updates where we now show title and description along with the diffstat output,

I also manually copied in two of the recent entries: the new package emu where CRANberries had fallen over as we could not find the package description (in the new spot), and the existing package GEOmap where diffstat failed as we somehow didn't have a proper tarnall of the previous sources.

/computers/R | permanent link

Sat, 11 Aug 2007

The amazing Prof. Ripley (cont'ed)

A little mini-meme got started on August 1 when Ben Bolker posted the following code to the r-devel list (and here I substituted the more standard '<-' assignment operator for the less standard though-now permitted '='):
x <- readLines("http://developer.r-project.org/R.svnlog.2007")
rx <- x[grep("^r",x)]
who <- gsub(" ","",sapply(strsplit(rx,"\\|"),"[",2))
twho <- table(who)
twho["ripley"]/sum(twho)
In five lines (that could be shortened to three at the expense of some readibility), the SVN log for R is downloaded directly from the website, the revision authors are extraced and then tabulated by submitter. The relative percentage of Brian Ripley is found to be a staggering 74.8% -- or about three times as much as the other fifteen committers combined. Smokes.

[ Oh, and for those who don't know him, he's also got a day job which presumably entails looking after his graduate students at Oxford. Who knows, he may even teach. Kidding aside, he's actually one of the nicest persons you'll ever meet in real life. ]

Now yesterday, Simon Jackman who had at first simply repeated Ben's analysis on his own blog followed up with a nice analysis (albeit typeset in a way that rendered the code inoperational, which has now been fixes) that creates both a histogram and a dotplot of commits per hour of the day. Omitting Ben's code which Simon reuses, we have the following for histogram and dotchart:

tod <- unlist(sapply(rx,function(x)strsplit(x,split=" ")[[1]][6]))
tod <- tod[who=="ripley"]

tz <- sub(pattern=".*(-[0-9]{4}).*",replacement="\\1",x=rx)
tz <- tz[who=="ripley"]
tz <- as.numeric(tz)/100
offset <- 3600*tz

z <- strptime(tod,format="%H:%M:%S")
hist(z,"hours",main="Ripley Commit Times in SVN TZ")

h <- z - offset
h <- format(h,format="%H")
h <- factor(as.numeric(h), levels=0:23)
dotchart(table(h), main="Ripley Commit Times, By Hour in GMT",
         labels=paste(0:23,1:24,sep=":"))
This extracts the commit times, subsets to the ones by Prof. Ripley, extracts the timezones component (as strptime seemingly doesn't do that which is a pain), extracts the tz-less time via strptime into a variable 'z' for which the histogram is drawn. He then corrects the times by the tz offset expressed in seconds, formats is as hour of the day and turns it into a 'factor' (an R data type for qualitative variables which may be ordered as is the case here) and draws a dotplot. This results in the following chart:

Simon Jackman's per-hour charts of Brian Ripley's commit patterns

Now, nobody has looked at the time series. So we correct this and add the following:

## rather extract both  date and time
dat <- unlist(sapply(rx, function(x) {
  txt <- strsplit(x,split=" ")[[1]]
  paste(txt[5], txt[6])
}))
## subset on Prof Ripley
dat <- dat[who == "ripley"]
## and convert to POSIXct, correcting by tz as well
datpt <- as.POSIXct(strptime(dat,format="%Y-%m-%d %H:%M:%S")) - offset

## turn into zoo -- we use a constant series of ones as each
## committ is taken as a timestamped event
datzoo <- zoo(1, order.by=datpt)
## and use zoo to aggregate into commits per date
daily <- aggregate(datzoo, as.Date(index(datzoo)), sum)

## now plot as grey bars
plot(daily, col='darkgrey', type='h', lwd=2,
     ylab="Nb of SVN commits, three-week median",
     xlab="R release dates 2.5.0 and 2.5.1 shown in orange",
     main="The amazing Prof. Ripley")
## mark the two R releases of 2007
abline(v=c(as.Date("2007-04-24"),as.Date("2007-06-28")),col='orange',lwd=1.5)
## and do a quick centered rolling median
lines(rollmedian(daily, 21, align="center"), lwd=3)
This extracts both date and time, creates a proper R time object (a so-called POSIXct type) from it, fills a zoo ('the' magic class for time series) object with it, uses zoo to aggregate commits per day and plots those in a barchart-alike (I know, I know, ...) plot to which we add the two releases as well as a rolling and centered three-week median (as a real quick hack rather than a proper smooth).

Timeseries of Brian Ripley's commit patterns

This shows that Prof Ripley averaged about ten commits a day before and after the release of R 2.5.0, and that he has slowed down ever so slightly since then to end up at around a mere seven commits a day. Every day. For the seven-plus months we looked at.

So, anyone for analysing his r-help posting frequencies ?

/computers/R | permanent link

Mon, 09 Jul 2007

Announcing CRANberries

Earlier today I sent an announcement to the r-packages list. It describes CRANberries, two simple RSS feeds that summarize both 'new' and 'updated' packages at CRAN, the archive network for R. I cooked this up rather quickly using a few lines of R, a small SQLite db backend and the old Blosxom blog engine. A tip of the hat to Barry Rowlingson who almost immediately suggested to use the lol format instead.

The hope is that this proves helpful for keeping tabs on the amazing growth of CRAN (which is now at over one thousand packages) as well as the number of updates to existing packages. The feed(s) can be consumed standalone, or via the brand new Planet R aggregator that Elijah announced today too.

/computers/R | permanent link

Mon, 02 Jul 2007

More on 'nicer charts'

Via the Planet Debian aggregator and his blog, Sven followed up on my post regarding Lucas' plot of the package age distribution.

As some of my points didn't seem to make it across, I will reiterate them more plainly:

  • GNUplot, while easy to use, creates charts that aren't terribly pretty;
  • Lucas' original chart had, to paraphrase an expression by Tufte, a poor 'ink to paper ratio': the data is too concentrated in the last quartile;
  • for that very reason, taking logs is a good thing here

Sven also addresses the fact that what we really want is to see the quantiles of the data set. Quite right, and taking logs makes that easier. Consider the two charts below which plot the 'package age in days' as an empirical cumulative distribution function using built-in R functions ecdf and plot.stepfun (rather than redoing it ad-hoc as I had done), and also add explicitly quantiles. The two charts use the exact same instructions; however the second chart transforms the x-axis to a logarithmic scale.

Debian Package Age-since-recompile Distributions charted two ways

While it is close to impossible to find the 25 or 50 percentile on the first chart, it becomes a lot easier on the second chart because the x-axis is 'stretched' using the log transform. About one quarters of the distribution appears to be rebuild within 1.5 months old, and about half is younger than four months (as a quick call to summary(pkgAge) confirms). Reading these proprtions off the original chart, or the non-log chart, is much more difficult.

/computers/R | permanent link

Fri, 29 Jun 2007

Improving simple charts

Earlier today and via Planet Debian, Lucas blogged about the 'age distribution' of Debian packages, defined as the time since the last (re-)compilation. He illustrated his findings with an, umm, rather ugly chart. Having climbed onto the soap box once before, I would like to point out how easy it can be to create simple, informative, and, at to least to me, prettier charts using R.

Lucas included a URL to the data. The first nice thing to note that we can read the data directly from the URL -- no need to copy the file:

pkgAge <- read.table(file="http://people.debian.org/~lucas/arch-age/arch-age.log", col.names=c("pkg","yyyymmdd"))
read the data into a data.frame which we have given two column names.
pkgAge[,"date"] <- as.Date(as.character(pkgAge[,"yyyymmdd"]), "%Y%m%d")
pkgAge[,"age"] <- as.numeric(difftime(Sys.Date(), pkgAge[,"date"], units="day"))
pkgAge[,"prop"] <- (1:nrow(pkgAge)) / nrow(pkgAge) * 100
We then create three new columns. First is a date, by parsing the (integer) dates (after first casting them into characters) by supplying the format in standard C notation: "%Y%m%d" for year, date and month without any separators or formatters. Now, having the date as an actual date object inside a real data analysis language we can do things as e.g. computing date differences. The difftime function does just that, using the current date as other point. We ask for the return to be in days, and cast this down to a purely numeric vector (instead of datediff object). Lastly, we quickly compute the date proportion in percentages.

We can then view the date. Before we plot,

png("packageAges.png", quality=100, width=640, height=480, pointsize=10)
oldpar <- par(mfrow=c(2,2), mar=c(2.5,2.5,3,1))
we direct the charts to a png file of given dimensions, and ask for all plots in one figure (using mfrow with two rows by two) with somewhat smaller figure margins using the mar argument to par.

The first chart shows again proportion over date:

with(pkgAge, plot(date, prop, type='l', main="Standard Plot"))
(The with() function simply allows us to refer to the columns by their names without explicit subsetting. plot(pkgAge[,"date",], pkgAge[,"prop"]) is equivalent, but more cumbersome.)

As it clear that the data has a fairly long tail in the older dates, we can also try to plot the plot over logarithmic time differences. This doesn't work for dates, but it works for our (positive-valued) age variable:

with(pkgAge, plot(age, prop, type='l', log="x", main="More linear as log(age in days)"))

The very far left tail below 0.5 percent is interesting as the one very old package is clearly an outlier within an outlier region. We use the subset function to take just one portion of the data, use logs, and explicit plotting symbols '+' in a points-and-lines plot:

with(subset(pkgAge, prop<0.5), plot(date, prop, type='b', log="y", pch="+", main="Detail in left tail, up to 0.5%"))

Lastly, the upper quartile is fairly linear.

with(subset(pkgAge, prop>75), plot(date, prop, type='l', pch=".", main="Yet fairly linear in top 25%"))

At the end

oldpar <- par(mfrow=c(2,3))
dev.off()
we restore the graphics paramters and close the device (here the file). All this then yields the following chart:

Debian Package Age-since-recompile Distributions

Updated to correctly display the assignment operator <-

/computers/R | permanent link

Mon, 24 Apr 2006

RGtk2 packages in Debian

As previously mentioned, I had built local packages of Michael Lawrence's RGtk2 port of the Gtk v2 widgets / toolkit to the R language and environment. This package, along with the related cairoDevice package, is now in Debian's main archive. RGtk2 extends and replaces the older r-omegahat-rgtk package.

For an two interesting applications using RGtk2, look at Graham William's rattle, a glade-based data-mining user-interface to several R functions, and at John Verzani's PMG (aka Poor Man's GUI) generic GUI for R.

/computers/R | permanent link

Thu, 02 Feb 2006

RGtk2 packages available

Michael Lawrence announced the release of RGtk2 yesterday on both r-packages and r-sig-gui. This follows his impressive presentation at DSC 2005 and finally makes the code available.

RGtk2 is, as the name suggests, an update of the older RGtk package (that I'd been maintaining in Debian as r-omegahat-rgtk) to the 'newer' version 2 of Gtk (aka The GIMP Toolkit). It provides Gtk goodies for the amazing R language and environment many of us dig for its use in statistical computing, visualization, data analysis, estimation and more.

RGtk2 is quite an achievement. The source package is huge at 1.9mb, the resulting Debian package even huger at 5.3mb, and it all seems to work fine based on some initial tests of running the demos. Based on the two source packages, I created two packages of RGtk2 (as r-omegahat-rgtk2) and the Cairo device (as r-omegahat-cairodevice) which you can fetch from here.

Not sure if I feel the urge to maintain them, but if someone else wants to step forward, let me know. Sources and diffs are in the same directory. Feedback welcome.

Update: By the way, what do we need to build with gtkmozembed? I didn't find a proper Build-Depends: for this.

/computers/R | permanent link

Sun, 27 Feb 2005

First `r-devel' builds leading up to R 2.1.0

As usual, the next GNU R release (expected in April) will bring yet another set of changes. Among the very user-visible changes are initial localisation support, and a disappearance of the (never completed) Gnome front-end.

I have built an initial set of packages, mostly to convince myself that nothing too drastic was needed inside the debian/ directory. These packages are for now here on my box but I guess I could upload them to Debian's experimental distribution for which the changelog entry is tagged.

I tend not to install any locales packages on my machines, so I can't really test if the localisation support works -- configure and friends certainly suggest it based on the compile-time messages. So my dear reader, if you are into non-default locales and R, and have a minute, grab the package and try something like

$ LANGUAGE=de LC_all=de LC_MESSAGES=de LANG=de LOCALE=de R

R : Copyright 2005, The R Foundation for Statistical Computing
Version 2.1.0 Under development (unstable) (2005-02-27), ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for a HTML browser interface to help.
Type 'q()' to quit R.

> Sys.getlocale()
[1] "C"
> q("no")
As is plain from the above, I somehow failed to tell R about 'de' as an alternative. However, capabilities() does report TRUE for attribute iconv, so this should work ... Feedback welcome -- right now, R has po files for de and it so no need to test other languages.

/computers/R | permanent link

Mon, 21 Feb 2005

Nicer charts

Wouter had blogged about his charts based on Takuo's per-maintainer statistics from the Debian BTS.

In an effort of true R evangelism, I bugged Wouter about the gnuplot ugliness in those charts and offered an R script as an alternate. Per his reply, he seemed please with the output [1] -- click on the png for a nicer pdf:

Debian BTS timeseries for edd@debian.org

For anybody interested, the R code is available. It provides two simple functions. The first actually creates the 4x1 chart. The second loops over all datafiles ending in .csv in a given directory, assuming the filename before the .csv ending provides the unique identifier (here the maintainer name and email as per Takuo's and Wouter's setup). On a per file basis, data is loaded, and a pdf and png are produced. This can be called as in

R CMD BATCH debian_bts_chart.R
which will create R CMD BATCH debian_bts_chart.Rout in the current directory.

Lastly, I should note that I do think that the underlying data is wrongly classified. Counting a bug as 'active' even when it has been closed, but is not yet archived merely because the 28 days period hasn't passed is plain wrong. In my book, a closed bug cannot count as an active one.

[1] This is my own data, and it shows the drop-off a few weeks ago when I passed maintainership of a good dozen packages on to others.

/computers/R | permanent link