Sun, 21 Dec 2014

If only there was a Romeo somewhere ...

Attention: rant coming. You have been warned, and may want to tune out now.

So the top of my Twitter timeline just had a re-tweet posting to this marvel on the state of Julia. I should have known better than to glance at it as it comes from someone providing (as per the side-bar) Thought leadership in Big Data, systems architecture and more. Reading something like this violates the first rule of airport book stores: never touch anything from the business school section, especially on (wait for it) leadership or, worse yet, thought leadership.

But it is Sunday, my first cup of coffee still warm (after finalising two R package updates on GitHub, and one upload to CRAN) and so I read on. Only to be mildly appalled by the usual comparison to R based on the same old Fibonacci sequence.

Look, I am as guilty as anyone of using it (for example all over chapter one of my Rcpp book), but at least I try to stress each and every time that this is kicking R where it is down as its (fairly) poor performance on functions calls (that is well-known and documented) obviously gets aggravated by recursive calls. But hey, for the record, let me restate those results. So Julia beats R by a factor of 385. But let's take a closer look.

For n=25, I get R to take 241 milliseconds---as opposed to his 6905 milliseconds---simply by using the same function I use in every workshop, eg last used at Penn in November, which does not use the dreaded ifelse operator:

fibR <- function(n) {
  if (n < 2) return(n)
  return(fibR(n-1) + fibR(n-2))

Switching that to the standard C++ three-liner using Rcpp

cppFunction('int fibCpp(int n) { 
  if (n < 2) return(n);  
  return(fibCpp(n-1) + fibCpp(n-2));  

and running a standard benchmark suite gets us the usual result of

R> library(rbenchmark)
R> benchmark(fibR(25),fibCpp(25),order="relative")[,1:4]
        test replications elapsed relative
2 fibCpp(25)          100   0.048    1.000
1   fibR(25)          100  24.674  514.042

So for the record as we need this later: that is 48 milliseconds for 100 replications, or about 0.48 milliseconds per run.

Now Julia. And of my standard Ubuntu server running the current release 14.10:

edd@max:~$ julia
ERROR: could not open file /home/edd//home/edd//etc/julia/juliarc.jl
 in include at boot.jl:238


So wait, what? You guys can't even ensure a working release on what is probably the most popular and common Linux installation? And I get to that after reading a post on the importance of "Community, Community, Community" and you can't even make sure this works on Ubuntu? Really?

So a little bit of googling later, I see that julia -f is my friend for this flawed release, and I can try to replicate the original timing

edd@max:~$ julia -f
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation:
   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.2.1 (2014-02-11 06:30 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-linux-gnu

julia> fib(n) = n < 2 ? n : fib(n - 1) + fib(n - 2)
fib (generic function with 1 method)

julia> @elapsed fib(25)


Interestingly the posts author claims 18 milliseconds. I see 2.3 milliseconds here. Maybe someone is having a hard time comparing things to the right of the decimal point. Or maybe his computer is an order of magnitude slower than mine. The more important thing is that Julia is of course faster than R (no surprise: LLVM at work) but also still a lot slower than a (trivial to write and deploy) C++ function. Nothing new here.

So let's recap. Comparison to R was based on a flawed version of a function we only use when we deliberately want to put R down, can be improved significantly when using a better implementation, results are still off by order of magnitude to what was reported ("math is hard"), and the standard C / C++ way of doing things is still several times faster than our new saviour language---which I can't even launch on the current version of one of the more common free operating systems. Ok then. Someone please wake me up in a few years and I will try again.

Now, coming to the end of the rant I should really stress that of course I too hope that Julia succeeds. Every user pulled away from Matlab is a win for all us. We're in this together and the endless navel gazing between ourselves is so tiresome and irrelevant. And as I argue here, even more so when we among ourselves stick to unfair comparisons as well as badly chosen implementation details.

What matters are wins against the likes of Matlab, Excel, SAS and so on. Let's build on our joint strength. I am sure I will use Julia one day, and I am grateful for everyone helping with it---as a lot of help seems to be needed. In the meantime, and with CRAN at 6130 packages that just work I'll continue to make use of this amazing community and trying my bit to help it grow and prosper. As part of our joint community.

/computers/misc | permanent link