|
|
Thinking inside the box | |||||
|
Bio
Code Papers Talks Linux Quantian About Blog
|
Faster (recursive) function calls: Another quick Rcpp case study
This leads to very straightforward implementations using recursion: ## R implementation of recursive Fibonacci sequence fibR <- function(n) { if (n == 0) return(0) if (n == 1) return(1) return (fibR(n - 1) + fibR(n - 2)) } Unfortunately, this elegant implementation which remain close to the abtract formulation of the recurrence algorithm performs very poorly in R as there is noticeable overhead in function calls which becomes dominant in a recursion. This lead to the original question on StackOverflow, and the accepted answer uses a trick presented by Pat Burns in his lovely R Inferno: rewrite the solution using a computer science trick called memoization: fibonacci <- local({ memo <- c(1, 1, rep(NA, 100)) f <- function(x) { if(x == 0) return(0) if(x < 0) return(NA) if(x > length(memo)) stop("'x' too big for implementation") if(!is.na(memo[x])) return(memo[x]) ans <- f(x-2) + f(x-1) memo[x] <<- ans ans } }) That is a fair answer, and even more was suggested with a link to a terrific analysis calling the Fibonacci recurrence the worst algorithm in the world. That is also fair, but all the basic research into better algorithms exploiting some structure of the problem to advance performance (and of course understanding) is overlooking one crucial part: algorithm analysis is essentially independent of the language. So whatever improvements we obtain by thinking really hard about a problem are then available for other implementations too. So with a tip of the hat to the old Larry Wall quote about Lazyness, Impatience and Hubris, I would like to suggest what I consider a much simpler route to much better performance: recode it in C++ using both Rcpp (for the R/C++ integration) and inline for the on-the-fly compilation, linking and loading of C++ code into R. ## inline to compile, load and link the C++ code require(inline) ## we need a pure C/C++ function as the generated function ## will have a random identifier at the C++ level preventing ## us from direct recursive calls incltxt <- ' int fibonacci(const int x) { if (x == 0) return(0); if (x == 1) return(1); return (fibonacci(x - 1)) + fibonacci(x - 2); }' ## now use the snippet above as well as one argument conversion ## in as well as out to provide Fibonacci numbers via C++ fibRcpp <- cxxfunction(signature(xs="int"), plugin="Rcpp", incl=incltxt, body =' int x = Rcpp::as<int>(xs); return Rcpp::wrap( fibonacci(x) ); ') This single R function call
A performance comparison of the basic R version So the recursion for the original argument of N=35 takes just over a minute at about 61.5 and 61.9 seconds, respectively, for the R version and its byte-compiled variant (as per the column titled elapsed). So byte-compilation essentially offers no help for the bottleneck of slow function calls.edd@max:~/svn/rcpp/pkg/Rcpp/inst/examples/Misc$ r fibonacci.r Loading required package: inline Loading required package: methods Loading required package: compiler test replications elapsed relative user.self sys.self 3 fibRcpp(N) 1 0.092 1.0000 0.09 0.00 2 fibRC(N) 1 61.480 668.2609 61.47 0.00 1 fibR(N) 1 61.877 672.5761 61.83 0.02 edd@max:~/svn/rcpp/pkg/Rcpp/inst/examples/Misc$
The C++ versions relying on Rcpp which created in a few lines of code and a
single call to That provides another nice demonstration of what Rcpp can do. Improved algorithms for well-understood problems are surely one way to accelerate solutions. But there are (many ?) times when we do not have the luxury of being able to think through to a new and improved approach. Or worse, such an approach may even introduce new errors or inaccurracies if we get it wrong on a first try. With Rcpp, we are able to the express the problem as written in its original statement: a simple recursion. The gain relative to a slow R implementation is noteworthy---and could of course be improved further if we really needed to by relying on better algorithms like memoization. But for day to day tasks, I gladly take speedups of (up to) a few hundred times thanks to Rcpp without having to do hard algorithmic work. Before closing, a quick reminder that I will be giving two classes on Rcpp in a few weeks. These will be in New York on September 24, and San Franciso on October 8, see this blog post as well as this page at Revolution Analytics (who are a co-organiser of the classes) for details and registration information. |
|||||