Wed, 03 Feb 2010

RProtoBuf 0.1-0

Romain uploaded our first release of RProtoBuf to CRAN yesterday. RProtoBuf provides bindings for GNU R to the Google Protobuf implementation. Google Protobuf is (and I quote) a way of encoding structured data in an efficient yet extensible format that is used for almost all internal RPC protocols and file formats at Google.

RProtoBuf had a funny start. I had blogged about the 12 hour passage from proof of concept to R-Forge project following the ORD session hackfest in October. What happened next was as good. Romain emailed within hours of the blog post and reminded me of a similar project that is part of Saptarshi Guha's RHIPE R/Hadoop implementation. So the three of us--Romain, Saptarshi and I---started emailing and before long it becomes clear that Romain is both rather intrigued by this (whereas Saptarshi has slightly different needs for the inner workings of his Hadoop bindings) and was able to devote some time to it. So the code kept growing and growing at a fairly rapid clip. Til that stopped as we switched to working feverishly on Rcpp to both support the needs of this project, and to implement ideas we had while working on this. That now lead to the point where Rcpp is maturing in terms of features, so we will probably have time come back to more work on RProtoBuf to take advantage of the nice templated autoconversions we now have in Rcpp. Oddly enough, the initial blog post seemed to anticipate changes in Rcpp.

Anyway -- RProtoBuf is finally here and it already does a fair amount of magic based of code reflection using the proto files. The Google documentation has a simple example of a 'person' entry in an 'addressbook' which, when translated to R, goes like this:

R> library( RProtoBuf )                      ## load the package
R> readProtoFiles( "addressbook.proto" )     ## acquire protobuf information
R> bob <- new( tutorial.Person,              ## create new object
+   email = "bob@example.com",
+   name = "Bob",
+   id = 123 )
R> writeLines( bob$toString() )              ## serialize to stdout
name: "Bob"
id: 123
email: "bob@example.com"

R> bob$email                                 ## access and/or override
[1] "bob@example.com"
R> bob$id <- 5
R> bob$id
[1] 5

R> serialize( bob, "person.pb" )             ## serialize to compact binary format

There is more information at the RProtoBuf page, and we already have a draft package vignette, a 'quick' overview vignette and a unit test summary vignette.

More changes should be forthcoming as Romain and I find time to code them up. Feedback is as always welcome.

/code/rprotobuf | permanent link