A fun weekend-morning project, namely wrapping the outstanding simdjson library by Daniel Lemire (with contributions by Geoff Langdale, John Keiser and many others) into something callable from R via a new package RcppSimdJson lead to a first tweet on January 20, a reference to the brand new github repo, and CRAN upload a few days later—and then two weeks of nothingness.
Well, a little more than nothing as Daniel is an excellent “upstream” to work with who promptly incorporated two changes that arose from preparing the CRAN upload. So we did that. But CRAN being as busy and swamped as they are we needed to wait. The ten days one is warned about. And then some more. So yesterday I did a cheeky bit of “bartering” as Kurt wanted a favour with an updated digest version so I hinted that some reciprocity would be appreciated. And lo and behold he admitted RcppSimdJson to CRAN. So there it is now!
We have some upstream changes already in git, but I will wait a few days to let a week pass before uploading the now synced upstream code. Anybody who wants it sooner knows where to get it on GitHub.
simdjson is a gem. Via some very clever algorithmic engineering to obtain largely branch-free code, coupled with modern C++ and newer compiler instructions, it results in persing gigabytes of JSON parsed per second which is quite mindboggling. I highly recommend the video of the recent talk by Daniel Lemire at QCon (which was also voted best talk).
The NEWS entry (from a since-added NEWS file) for the initial RcppSimdJson upload follows.
Changes in version 0.0.1 (2020-01-24)
Initial CRAN upload of first version
Comment-out use of
stdout
(now updated upstream)Deactivate use computed GOTOs for compiler compliance and CRAN Policy via
#define
If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.