Thinking inside the box

gcbd 0.2.6

A pure maintenance release 0.2.6 of the gcbd package is now on CRAN. The gcbd proposes a benchmarking framework for LAPACK and BLAS operations (as the library can exchanged in a plug-and-play sense on suitable OSs) and records result in local database. Recent / upcoming changes to DBMI and RSQLite let me to update the package; there are no actual functionality changes in this release.

CRANberries also provides a diffstat report for the latest release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/gcbd | permanent link

gcbd 0.2.5

A maintenance release (now at version 0.2.5) of my gcbd package (described only in these two posts) is now CRAN. More details about the package are available in the paper which is also included in the gcbd R package.

Changes were minimal and driven mostly by some CRAN Policy changes which now prefer vignette sources files in (top-level) directory vignettes/

Courtesy of CRANberries, there is also a diffstat report for the most recent release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/gcbd | permanent link

More BLAS, BLASter, BLAStest: Updates on gcbd

Following up on my initial post announcing gcbd, here is a brief note on a new version. The initial post announced version 0.2.2 which was the first CRAN version of gcbd. I updated to 0.2.3 when I made the aforementioned first blog post about gcbd with the lattice plot of the BLAS and GPU benchmark results across six different implementations (from reference BLAS to two Atlas versions, Goto, MKL and a GPU-based one).

There is now a new version 0.2.4 of gcbd on CRAN. I revised the paper ever so slightly based on some more feedback, and focussed the results sections by concentrating on just the log-axes lattice blot and the corresponding lattice plot of raw results---where the y-axis is capped at 30 seconds:

This chart--in levels rather than using logarithmic axes is done here--nicely illustrates just how large the performance difference can be for for matrix multiplication and LU decomposition. QR and SVD are closer but accelerated BLAS libraries still win. GPUs can be compelling for some tasks and large sizes.

More discussion is still available in the paper which is also included in the gcbd package for R.

/code/gcbd | permanent link

BLAS, BLASter, BLAStest: Some benchmark results, and a benchmarking framework

Usage of accelerated BLAS libraries seems to shrouded in some mystery, judging from somewhat regularly recurring requests for help on lists such as r-sig-hpc (gmane version), the R list dedicated to High-Performance Computing. Yet it doesn't have to be; installation can be really simple (on appropriate systems).

Another issue that I felt needed addressing was a comparison between the different alternatives available, quite possibly including GPU computing. So a few weeks ago I sat down and wrote a small package to run, collect, analyse and visualize some benchmarks. That package, called gcbd (more about the name below) is now on CRAN as of this morning. The package both facilitates the data collection for the paper it also contains (in the vignette form common among R packages) and provides code to analyse the data---which is also included as a SQLite database. All this is done in the Debian and Ubuntu context by transparently installing and removing suitable packages providing BLAS implementations: that we can fully automate data collection over several competing implementations via a single script (which is also included). Contributions of benchmark results is encouraged---that is the idea of the package.

The paper itself describes the background and technical details before presenting the results. The benchmark compares the basic reference BLAS, Atlas (both single- and multithreaded), Goto, Intel MKL and a GPU-based approach. This blog post is not the place to recap all results, so please do see the paper for more details. But one summary chart regrouping the main results fits well here:

This chart, in a log/log form, shows how reference BLAS lags everything, how multithreaded newer Atlas improves over the standard Atlas package currently still the default in both distros, how the Intel MKL (available via Ubuntu) is fairly good but how Goto wins almost everything. GPU computing is compelling for really large sizes (at double precision) and too costly at small ones. It also illustrates variability and different computational cost across the methods tested: svd is more expensive than level-3 matrix multiplication, and the different implementations are less spread apart. More details are in the paper; code, data etc are in the package gcbd.

The larger context is to do something like this benchmarking exercise, but across distributions, operating systems and possibly also GPU cards. Mark and I started to talk about this during and after R/Finance earlier this year and have some ideas. Time permitting, that work should be happening in the GPU/CPU Benchmarks (gdb) project, and that's why this got called gcbd as a simpler GPU/CPU Benchmarks on Debian Systems study.

/code/gcbd | permanent link

Wed, 28 Sep 2016

gcbd 0.2.6

Sat, 14 Dec 2013

gcbd 0.2.5

Sun, 03 Oct 2010

More BLAS, BLASter, BLAStest: Updates on gcbd

Wed, 15 Sep 2010

BLAS, BLASter, BLAStest: Some benchmark results, and a benchmarking framework