There is now a new version 0.2.4 of gcbd on CRAN. I revised the paper ever so slightly based on some more feedback, and focussed the results sections by concentrating on just the log-axes lattice blot and the corresponding lattice plot of raw results---where the y-axis is capped at 30 seconds:
This chart--in levels rather than using logarithmic axes is done here--nicely illustrates just how large the performance difference can be for for matrix multiplication and LU decomposition. QR and SVD are closer but accelerated BLAS libraries still win. GPUs can be compelling for some tasks and large sizes.