Tue, 03 Feb 2009

Correct Datetime / POSIXct behaviour for R and kdb+

We have started to look into kdb+ as a possible high-performance column-store backend. Kx offers free trials -- and so I have played with this for a day or two, both the general system, data loads and dumps and in particular with the interface to R, Based on the few files (one C source with interface code, one R file to access the C code, one object file to link against, one header file and a simple Makefile), it took just a couple of minutes to turn this into a proper CRAN-style R package.

Anyway, the reason for this post was that the R / kdb+ glue code works well ... but not for datetimes. I really like to be able to pass date/time objects natively between systems as easily as, say, numbers or strings (and see e.g. my Rcpp package for doing this with R and C++) and I was a bit annoyed when the millisecond timestamps didn't move smoothly. Turns out that the basic converter function in the code had a number of problems: it converted to integer, only covered a single scalar rather than vectorised mode, and erroneously reduced a reference count. A better version, in my view, is as follows:

static SEXP from_datetime_kobject(K x) 
{
	SEXP result;
	int i, length = x->n;
	if (scalar(x)) {
		result = PROTECT(allocVector(REALSXP, 1));
		REAL(result)[0] = (kF(x)[0] + 10957) * 86400;
	} else {
		result = PROTECT(allocVector(REALSXP, length));
		for(i = 0; i < length; i++) {
		    	REAL(result)[i] = (kF(x)[i] + 10957) * 86400;
		}
	}
	SEXP datetimeclass = PROTECT(allocVector(STRSXP,2));
	SET_STRING_ELT(datetimeclass, 0, mkChar("POSIXt"));
	SET_STRING_ELT(datetimeclass, 1, mkChar("POSIXct"));
	setAttrib(result, R_ClassSymbol, datetimeclass);
	UNPROTECT(2); 
        return result; 
}
This deals with vectors as well as scalars, converts Kdb's 'fractional days since Jan 1, 2000' to the Unix standard of seconds since the epoch -- including the R extension of fractional seconds -- and as importantly, sets the class attributes to POSIXt POSIXct as needed by R. With that, a simple select max datetime from table does just that, and vectors of timestamped records of trades or quotes or whatever also come with proper POSIXct behaviour into R. Note that it needs TZ to be set to UTC, though, or you get a timezone offset you may not want.

/computers/R | permanent link