A new RcppSimdJson release arrived on CRAN late yesterday bringing along the one recently updated simdjson release 0.6.0.
RcppSimdJson wraps the fantastic and genuinely impressive simdjson library by Daniel Lemire and collaborators. Via very clever algorithmic engineering to obtain largely branch-free code, coupled with modern C++ and newer compiler instructions, it results in parsing gigabytes of JSON parsed per second which is quite mindboggling. The best-case performance is ‘faster than CPU speed’ as use of parallel SIMD instructions and careful branch avoidance can lead to less than one cpu cycle per byte parsed; see the video of the talk by Daniel Lemire at QCon (also voted best talk).
Other than the upstream update, Brendan added some new utilities to check for valid utf-8 or json format, and to minify json plus a small workaround for a clang-9 bug we encountered. We can confirm Daniel’s statement on ridiculously fast utf-8 validattion. It is so cool to work with amazing tools.
The NEWS entry follows.
Changes in version 0.1.3 (2020-11-01)
Added URLs to DESCRIPTION (Dirk closing #50).
Upgraded to simdjson 0.6.0 (Dirk in #52).
New policy option to always convert integers to
int64_t
(Brendan in #55 closing #54).Added workaround for odd clang-9 bug (Brendan in #57).
New utility functions
is_valid_utf8()
,is_valid_json()
andfminify()
(Brendan in #58).
Courtesy of my CRANberries, there is also a diffstat report for this release.
For questions, suggestions, or issues please use the issue tracker at the GitHub repo.
If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.