Another pure maintenance release 0.2.7 of the gcbd package is now on CRAN. The gcbd proposes a benchmarking framework for LAPACK and BLAS operations (as the library can exchanged in a plug-and-play sense on suitable OSs) and records result in local database. Its original motivation was to also compare to GPU-based operations. However, as it is both challenging to keep CUDA working packages on CRAN providing the basic functionality appear to come and go so testing the GPU feature can be challenging. The main point of gcbd is now to actually demonstrate that ‘yes indeed’ we can just swap BLAS/LAPACK libraries without any change to R, or R packages. The ‘configure / rebuild R for xyz’ often seen with ‘xyz’ being Goto or MKL is simply plain wrong: you really can just swap them (on proper operating systems, and R configs – see the package vignette for more). But nomatter how often we aim to correct this record, it invariably raises its head another time.
This release accommodates a CRAN change request as we were
referencing the (now only suggested) package gputools
. As
hinted in the previous paragraph, it was once on CRAN but is not right now so we
adjusted our reference.
CRANberries also provides a diffstat report for the latest release.
If you like this or other open-source work I do, you can sponsor me at GitHub.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
A pure maintenance release 0.2.6 of the gcbd package is now on CRAN. The gcbd proposes a benchmarking framework for LAPACK and BLAS operations (as the library can exchanged in a plug-and-play sense on suitable OSs) and records result in local database. Recent / upcoming changes to DBMI and RSQLite let me to update the package; there are no actual functionality changes in this release.
CRANberries also provides a diffstat report for the latest release.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
vignettes/
Courtesy of CRANberries, there
is also a diffstat report for the
most recent release.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
There is now a new version 0.2.4 of gcbd on CRAN. I revised the paper ever so slightly based on some more feedback, and focussed the results sections by concentrating on just the log-axes lattice blot and the corresponding lattice plot of raw results---where the y-axis is capped at 30 seconds:
This chart--in levels rather than using logarithmic axes is done here--nicely illustrates just how large the performance difference can be for for matrix multiplication and LU decomposition. QR and SVD are closer but accelerated BLAS libraries still win. GPUs can be compelling for some tasks and large sizes.
More discussion is still available in the paper which is also included in the gcbd package for R.
Another issue that I felt needed addressing was a comparison between the different alternatives available, quite possibly including GPU computing. So a few weeks ago I sat down and wrote a small package to run, collect, analyse and visualize some benchmarks. That package, called gcbd (more about the name below) is now on CRAN as of this morning. The package both facilitates the data collection for the paper it also contains (in the vignette form common among R packages) and provides code to analyse the data---which is also included as a SQLite database. All this is done in the Debian and Ubuntu context by transparently installing and removing suitable packages providing BLAS implementations: that we can fully automate data collection over several competing implementations via a single script (which is also included). Contributions of benchmark results is encouraged---that is the idea of the package.
The paper itself describes the background and technical details before presenting the results. The benchmark compares the basic reference BLAS, Atlas (both single- and multithreaded), Goto, Intel MKL and a GPU-based approach. This blog post is not the place to recap all results, so please do see the paper for more details. But one summary chart regrouping the main results fits well here:
This chart, in a log/log form, shows how reference BLAS lags everything, how multithreaded newer Atlas improves over the standard Atlas package currently still the default in both distros, how the Intel MKL (available via Ubuntu) is fairly good but how Goto wins almost everything. GPU computing is compelling for really large sizes (at double precision) and too costly at small ones. It also illustrates variability and different computational cost across the methods tested: svd is more expensive than level-3 matrix multiplication, and the different implementations are less spread apart. More details are in the paper; code, data etc are in the package gcbd.
The larger context is to do something like this benchmarking exercise, but across distributions, operating systems and possibly also GPU cards. Mark and I started to talk about this during and after R/Finance earlier this year and have some ideas. Time permitting, that work should be happening in the GPU/CPU Benchmarks (gdb) project, and that's why this got called gcbd as a simpler GPU/CPU Benchmarks on Debian Systems study.