simdjson by Daniel Lemire (with contributions by Geoff Langdale, John Keiser and many others) is an engineering marvel. Through very clever use of SIMD instructions, it manages to parse JSON files faster than disc access. Wut? Yes you read that right: parallel processing with so little overhead that the net throughput is limited only by disk speed.
Moreover, it is implemented in neat modern C++ and can be accessed as a header-only library. (Well, one library in two files, really.) Which makes R packaging easy and convenient and compelling. So here we are.
For further introduction, see the arXiv paper by Langdale and Lemire (out/to appear in VLDB Journal 28(6) as well) and/or the video of the recent talk by Daniel Lemire at QCon (voted best talk).
<- system.file("jsonexamples", "twitter.json", package="RcppSimdJson")
jsonfile library(RcppSimdJson)
validateJSON(jsonfile) # validate a JSON file
<- fload(jsonfile) # parse a JSON file res
A simple file-oriented parsing benchmark against the other R-accessible1 JSON parsers:
> print(res)
: microseconds
Unit
expr min lq mean median uq max neval cld312.267 347.683 405.177 390.11 425.827 926.776 100 a
yyjsonr 274.367 323.998 447.691 467.79 526.237 773.070 100 a
simdjson 2727.874 2813.681 2952.804 2896.84 2972.852 7442.755 100 b
jsonify 4237.538 4435.683 4587.428 4552.38 4668.345 7082.673 100 c
jsonlite 9131.864 9425.515 9707.274 9599.48 9845.006 13516.616 100 d
RJSONIO 91668.822 92628.357 95386.212 93192.37 94507.484 152179.095 100 e
ndjson >
Or in chart form, also including the second benchmark parsing strings
All three major OSs are supported, and JSON can be parsed from file and string under a variety of settings. A C++17 compiler is required for ease of setup (though the upstream can fall back to older compiler; one can edit src/Makevars accordingly if need be).
Any problems, bug reports, or features requests for the package can be submitted and handled most conveniently as Github issues in the repository.
Before submitting pull requests, it is frequently preferable to first discuss need and scope in such an issue ticket. See the file Contributing.md (in the Rcpp repo) for a brief discussion.
For standard JSON work on R, as well as for other nicely done C++ libraries, consider these:
For the R package wrapper, Dirk Eddelbuettel.
For everything pertaining to simdjson, Daniel Lemire (and many contributors) .
GPL (>= 2)