|
|
Thinking inside the box | |||||
|
Bio
Code Linux Quantian About Blog
|
useR 2010 at NIST in Gaithersburg
As at the preceding useR! 2008 in Dortmund and useR! 2009 in Rennes, I presented a three-hour tutorial on high-performance computing with R. This covers scripting/automation, profiling, vectorisation, interfacing compiled code, parallel computing and large-memory approaches. The slides, as well as a condensed 2-up version, are now on my presentations page. On Wednesday, Romain and I had a chance to talk about recent work on Rcpp, our R and C++ integration. Thursday, we followed up with a presentation on RProtoBuf -- a project integrating Google's Protocol Buffers with R which much to our delight already seems to be in use at Google itself! It was quite fun to do these two talks jointly with Romain. But my other coauthor Khanh had to be at a conference related to his actual PhD work. So on Friday it was just me to give a presentation about RQuantLib which brings QuantLib to R. Slides from all these talks have now been added to my presentations page. I will also upload them via the conference form so that they can be part of the conference's collection of presentations which should be forthcoming. Thu, 27 May 2010
WU Wien presentations
On Friday, I also gave an informal lecture / tutorial / workshop to some of the Stats and Finance Ph.D. students, drawing largely from the section on parallel computing of the most recent Introduction to High-Performance Computing with R tutorial. My sincere thanks to Kurt Hornik and Stefan Theussl for the invite -- it was a great trip, notwithstanding the mostly unseasonally cold and wet weather. Tue, 20 Apr 2010
R / Finance 2010 presentations
As a co-organizer, it was a great pleasure to see so many users of R in Finance---from both industry and academia---come to Chicago to discuss and share recent work. There is a lot going on, and it is always good to exchange ideas with others sharing the same infrastructure. Participants appeared to enjoy the conference. My thanks to everybody who helped to put it together, from the local committee to the helping hands at UIC and of course the sponsors. I just put my slides from the Extending and Embedding R with C++ tutorial preceding the conference, as well as the RQuantLib: Interfacing QuantLin from R presentation (with Khanh Nguyen), up onto my presentations page. I do have a usb-drive with all conference presentations and will provide them via the R / Finance site in a few days. The only truly sour note is the fact that several presenters from Europe had their travels schedules turned upside down by the disruption to international air travel caused by the Icelandic volcano eruption and the resulting ash clouds. While we are glad to have had them for a little longer in Chicago, we understand that they are getting eager to return home. I hope this extended stay in the Windy City does not take away from the overall usefulness of the trip. Wed, 07 Apr 2010
Video of UCLA / LA RUG talk on R and C++ integration
Thanks also to David Smith (at the REvolutions blog) and Drew Conway (at his blog) for spreading the word about the presentation video and slides -- quite a few folks have come to my presentations page to get them. Sun, 04 Apr 2010
UCLA and LA RUG talks on R and C++ integration
The talks centered around R and C++ integration using both Rcpp and RInside and summarise where both projects stand after all the recent work Romain and I put in over the last few months. The presentations went fairly well; I received some favourable comments. Szilard and the R User Group had also suggested a group discussion about CRAN, its growth and how to maximise its usefulness. Given my CRANberries feed, my work on the CRAN Task Views for Empirical Finance and High-Performance Computing with R as well as our cran2deb binary package generator, I had some views and ideas that helped frame the discussion which turned out to very useful and informed. So maybe we should do this User Group thing in Chicago too! Special thanks to Jan de Leeuw and Szilard Pafka for organising the meeting, talks and discussion. Thu, 25 Feb 2010
R and Sudoku solvers: Plus ca change...
But what everybody seems to be forgetting is that R has had a Sudoku solver for years, thanks to the sudoku package by David Brahm and Greg Snow which was first posted four years ago. What comes around, goes around. With that, and about one minute of Emacs editing to get the Le Monde puzzle into the required ascii-art form, all we need to do is this:
That took all of five seconds while my computer was also compiling a particularly resource-hungry C++ package....R> library(sudoku) R> s <- readSudoku("/tmp/sudoku.txt") R> s [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] 8 0 0 0 0 1 2 0 0 [2,] 0 7 5 0 0 0 0 0 0 [3,] 0 0 0 0 5 0 0 6 4 [4,] 0 0 7 0 0 0 0 0 6 [5,] 9 0 0 7 0 0 0 0 0 [6,] 5 2 0 0 0 9 0 4 7 [7,] 2 3 1 0 0 0 0 0 0 [8,] 0 0 6 0 2 0 1 0 9 [9,] 0 0 0 0 0 0 0 0 0 R> system.time(solveSudoku(s)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] 8 4 9 6 7 1 2 5 3 [2,] 6 7 5 2 4 3 9 1 8 [3,] 3 1 2 9 5 8 7 6 4 [4,] 1 8 7 4 3 2 5 9 6 [5,] 9 6 4 7 8 5 3 2 1 [6,] 5 2 3 1 6 9 8 4 7 [7,] 2 3 1 8 9 4 6 7 5 [8,] 4 5 6 3 2 7 1 8 9 [9,] 7 9 8 5 1 6 4 3 2 user system elapsed 5.288 0.004 5.951 R> Just in case we needed another illustration that it is hard to navigate the riches and wonders that is CRAN... Thu, 18 Feb 2010
U of C ACM talk
R / Finance 2010 Open for Registration
See you in Chicago in April! Thu, 07 Jan 2010
Review of 'Computational Statistics: An Introduction to R' in JSS
Updated slides for 'Introduction to HPC with R' (now with correct URLs)
As mentioned yesterday, I spent a few days last week in Japan as I had an opportunity to present the Introduction to High-Performance Computing with R tutorial at the Institute for Statistical Mathematics in Tachikawa near Tokyo thanks to an invitation by Junji Nakano. An updated version of the presentations slides (with a few typos corrected) is now available as is a 2-up handout version. Compared to previous versions, and reflecting the fact that this was the 'all-day variant' of almost five hours of lectures, the following changes were made:
Comments and suggestions are, as always, appreciated. Mon, 09 Nov 2009
R / Finance 2010 Call for Papers
So without further ado, and given the success of our initial R / Finance 2009 conference about R in Finance, here is the call for papers for next spring:
See you in Chicago in April! Sat, 10 Oct 2009
R for system administration and scripting
One of such cases just happened a few minutes ago. The aforementioned Garmin Forerunner 405 can cooperate quite nicely with Linux using the gant reader for the ant wireless communication protocol between the usb hardware dongle and the Garmin 405. (Sources for gant are both this file and this git archive.) I had meant to blog about this tool and the resulting files one of these days anyway, but today I just want to mention that the default filenames created by the program were somewhat horrid such as 20.09.2009 101112.TCX to denote the 20th of September of this year at 10:11h and 12 seconds. As we all know, filenames with spaces are bad for the environment as well as plain annoying. So I had made the simple change in the C sources to switch to a saner format such as 20090920-101112.TCX (and I see that the git archive now contains a similar fix). But that still left me with some 80+ files with the dreaded names. There are of course many ways to skin this cat and to rename the files in bulk. However, I found the following four lines to be fairly succinct #!/usr/bin/r files <- dir(".", pattern=".*\\.TCX$") res <- lapply(files, function(f) { pt <- strptime(f, "%d.%m.%Y %H%M%S.TCX") # parsed time ft <- strftime(pt, "%Y%m%d-%H%M%S.TCX") # formatted time file.rename(f, ft) })as they show, among other things,
Lastly, I do not mean to imply that Python or Perl or Ruby or (insert favourite tool here) cannot do it equally well. I simply meant to say that programmatically creating new filenames is definitely easier in R than it would have been in shell. And as an added bonus, we even get fully parsed time objects that I could have tested for. But then tests and documentation never get written on a Saturday. Tue, 04 Aug 2009
State of the Art in Parallel Computing with R: Now published
cran2deb: Would you like 1700+ new Debian / R packages ?
This is essentially a '2.0' version of earlier work with Steffen Moeller and David Vernazobres which we had presented in 2007. Then, the approach was top-down and monolithic which started to show its limits. This time, the idea was to borrow the successful bottom-up approach of my CRANberries feed. The bulk of the work was done by Charles Blundell as part of his Google Summer of Code 2008 project which I had suggested and mentored. After that project had concluded, we both felt we should continue with it and bring it to 'production'. The CRAN hosts provided us with a (virtual Xen) machine to build on, and we are now ready to more publically announce the availability of the repositories for i386 and amd64: deb http://debian.cran.r-project.org/cran2deb/debian-i386 testing/and deb http://debian.cran.r-project.org/cran2deb/debian-amd64 testing/ A few more details are provided in our presentation slides. We look forward to hearing from folks using; the r-sig-debian list may be a good venue for this. Sun, 12 Jul 2009
useR 2009 in Rennes: Recap and slides
As last year (and again at the BoC in December), I presented a three-hour tutorial on high-performance computing with R. This covers profiling, vectorisation, interfacing compiled code, debugging, parallel computing, as well as scripting and automation. Slides, and a 2-up version, are now on my presentations page. I also gave two regular conference presentations. The first was on my Rcpp and RInside packages which facilitate interfacing R and C++. The second talk, based on joint work with Charles Blundell, describes our cran2deb system for creating Debian packages of essentially all CRAN packages. I will try to follow up on this with another post. Slides from these talks are also on my presentations page. Sat, 27 Jun 2009
R 2.9.1, CRANberries outage, and missing Java support
Speaking of broken, I had neither noticed that this R version now returns an additional field (for the repository) in the per-package metadata via available.packages(), nor that this change had broken my oh-so-useful and increasingly popular CRANberrries html and rss summaries of CRAN changes. So with the usual beta and rc releases or R 2.9.1 in Debian starting a week prior, CRANberries had been silent for six days from Friday the 21st to last Thursday. I rectified it once I noticed, and changed the code to no longer fall on its nose at that spot. Sorry for the few days without service. Wed, 29 Apr 2009
Slides from most recent R and HPC tutorial
This tutorial was a shorter format of just an hour which did not allow for any parallel computing with R. However, parallel computing with R via MPI, snow, nws, ... is covered in the slides from December's workshop at the BoC. Mon, 27 Apr 2009
Review of 'Analysis of Integrated and Cointegrated Time Series with R (2nd ed)' in JSS
R / Finance 2009
We were fortunate to get seven outstanding invited keynote speakers, as well as eleven excellent presentations. This was preceded by four short tutorials (and I'll post slides from my Introduction to High-Performance Computing with R soon). With about 150 registered participants, plus keynoters, presenters, committee members, representatives from the sponsors (a quick shout of Thanks! to them), some folks from UIC (especially Holly without whom few things would have happened), we were probably around 200 people gathered at UIC. And then there was an extended social program at Jaks which is rather appropriate as we had numerous important committee meetings there over the preceding months. All in all it seems like a successful event. We may even do it again. Thu, 05 Mar 2009
Short introduction to R in Finance
I just posted my slides on my presentations page. The slides give a brief overview of R, the CRAN network and the by now over 1600 packages, mention the Finance Task View, briefly present four different packages (or package sets) and of course beat the drum for our upcoming R/Finance conference that will take place here in Chicago at the end of next month. Wed, 25 Feb 2009
Review of 'Applied Econometrics in R' in JSS
R/Finance conference in Chicago in April: Registration now open
See you in Chicago in April! Tue, 03 Feb 2009
Correct Datetime / POSIXct behaviour for R and kdb+
Anyway, the reason for this post was that the R / kdb+ glue code works well ... but not for datetimes. I really like to be able to pass date/time objects natively between systems as easily as, say, numbers or strings (and see e.g. my Rcpp package for doing this with R and C++) and I was a bit annoyed when the millisecond timestamps didn't move smoothly. Turns out that the basic converter function in the code had a number of problems: it converted to integer, only covered a single scalar rather than vectorised mode, and erroneously reduced a reference count. A better version, in my view, is as follows:
This deals with vectors as well as scalars, converts Kdb's 'fractional days
since Jan 1, 2000' to the Unix standard of seconds since the epoch --
including the R extension of fractional seconds -- and as importantly, sets
the class attributes to POSIXt POSIXct as needed by R. With
that, a simple select max datetime from table does just that,
and vectors of timestamped records of trades or quotes or whatever also
come with proper POSIXct behaviour into R. Note that it needs TZ to be set to UTC, though,
or you get a timezone offset you may not want.
Fri, 30 Jan 2009
State-of-the-art in parallel computing with R: New paper
New CRAN Task View on HPC
R featured in New York Times article
"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet." That's silly on so many levels. A concise and rather appropriate follow-up came in early from Frank Harrell, a long-time S and R advocate: This is great to see. It's interesting that SAS Institute feels that non-peer-reviewed software with hidden implementations of analytic methods that cannot be reproduced by others should be trusted when building aircraft engines. Achim already added this (and two more posts from the aforementioned threads) to the fortunes package that collects such choice quotes. R in Finance (the topic of our upcoming conference) gets mentioned as well. Now, as editor of the Finance task view, I find that second half of The financial services community has demonstrated a particular affinity for R; dozens of packages exist for derivatives analysis alone.to be a little off the mark. But that's minor as the article is broadly sympathetic, and mostly "gets it" where it matters. Recommended. Thu, 01 Jan 2009
R/Finance conference in Chicago in April: Call for Papers
Call for PapersSee you in Chicago in April! Tue, 23 Dec 2008
Updated 'Introduction to High-Performance Computing with R'
I just posted the updated slides from this talk, and there is also an updated live cdrom on the Alioth server. Also, it looks like the tutorial will be held again at UseR 2009 in Rennes, see here for a brief synopsis. It was nice to get back to Canada, even if it was a 24 hour whirlwind trip. Ottaws looked quite pretty in all the snow. And it seems that I got rather lucky with the travel dates as both the days before and after my trip had a large number of flight cancellations and delays due to snow storms. Mon, 01 Dec 2008
CRANberries prettified
So I quickly put together some simple css formatting to make it look a little better than the default blosxom theme it sported previously. That said, you probably should read the rss version (more about rss here) anyway! Update: Oops. And it even works with a correct path to the css file. Now fixed. Tue, 19 Aug 2008
UseR! 2008 talk
The talk introduces and extends an example related to some of the material from the tutorial itself. The slides from the talk are a little rough as the talk was somewhat ad-hoc: As session chair, I was confronted with a fairly last-minute cancellation and a 15 minute hole, and thought this would make a good little talk. It does show a nice trick for using littler with Open MPI (via snow) under the powerful slurm resource manager and batch/queue engine. Tue, 12 Aug 2008
UseR! 2008 tutorial
In a nutshell, the tutorial covered
how to measure / profile R performance for
speed and memory use, how to accelerate R using vectorised expression and
tools like Ra / jit, how to add compiled code to R using either
the The final version of the slides is now available via my presentations page, and the live cdrom with software support for all the software used is at Alioth. Update: Corrected link to presentations page thanks to heads-up by Charles. Thanks! Sat, 16 Feb 2008
CRANberries updated
But these changes also affected my my CRANberries (see the html or better yet rss view) summaries of new packages as some of the source information moved. So I just updated the (surprisingly short at 189 lines including plenty of whitespace and comments) script, and things should work now come the next update. While updating the 'more info' link for new and updates posts to point to the new-style entry at CRAN, I also took the opportunity to update the format of the `blog' entry for updates where we now show title and description along with the diffstat output,
I also manually copied in two of the recent entries: the new package
emu where CRANberries had fallen over as we
could not find the package description (in the new spot), and the existing package
GEOmap where
The amazing Prof. Ripley (cont'ed)
x <- readLines("http://developer.r-project.org/R.svnlog.2007")
rx <- x[grep("^r",x)]
who <- gsub(" ","",sapply(strsplit(rx,"\\|"),"[",2))
twho <- table(who)
twho["ripley"]/sum(twho)
In five lines (that could be shortened to three at the expense of some
readibility), the SVN log for R is
downloaded directly from the website, the revision authors are extraced and then
tabulated by submitter. The relative percentage of Brian Ripley is found
to be a staggering 74.8% -- or about three times as much as the other fifteen
committers combined. Smokes.
[ Oh, and for those who don't know him, he's also got a day job which presumably entails looking after his graduate students at Oxford. Who knows, he may even teach. Kidding aside, he's actually one of the nicest persons you'll ever meet in real life. ] Now yesterday, Simon Jackman who had at first simply repeated Ben's analysis on his own blog followed up with a nice analysis (albeit typeset in a way that rendered the code inoperational, which has now been fixes) that creates both a histogram and a dotplot of commits per hour of the day. Omitting Ben's code which Simon reuses, we have the following for histogram and dotchart:
tod <- unlist(sapply(rx,function(x)strsplit(x,split=" ")[[1]][6]))
tod <- tod[who=="ripley"]
tz <- sub(pattern=".*(-[0-9]{4}).*",replacement="\\1",x=rx)
tz <- tz[who=="ripley"]
tz <- as.numeric(tz)/100
offset <- 3600*tz
z <- strptime(tod,format="%H:%M:%S")
hist(z,"hours",main="Ripley Commit Times in SVN TZ")
h <- z - offset
h <- format(h,format="%H")
h <- factor(as.numeric(h), levels=0:23)
dotchart(table(h), main="Ripley Commit Times, By Hour in GMT",
labels=paste(0:23,1:24,sep=":"))
This extracts the commit times, subsets to the ones by Prof. Ripley, extracts
the timezones component (as strptime seemingly doesn't do that
which is a pain), extracts the tz-less time via strptime into a
variable 'z' for which the histogram is drawn. He then corrects the times by
the tz offset expressed in seconds, formats is as hour of the day and turns
it into a 'factor' (an R data type for qualitative variables which may be
ordered as is the case here) and draws a dotplot. This results in the
following chart:
Now, nobody has looked at the time series. So we correct this and add the following:
## rather extract both date and time
dat <- unlist(sapply(rx, function(x) {
txt <- strsplit(x,split=" ")[[1]]
paste(txt[5], txt[6])
}))
## subset on Prof Ripley
dat <- dat[who == "ripley"]
## and convert to POSIXct, correcting by tz as well
datpt <- as.POSIXct(strptime(dat,format="%Y-%m-%d %H:%M:%S")) - offset
## turn into zoo -- we use a constant series of ones as each
## committ is taken as a timestamped event
datzoo <- zoo(1, order.by=datpt)
## and use zoo to aggregate into commits per date
daily <- aggregate(datzoo, as.Date(index(datzoo)), sum)
## now plot as grey bars
plot(daily, col='darkgrey', type='h', lwd=2,
ylab="Nb of SVN commits, three-week median",
xlab="R release dates 2.5.0 and 2.5.1 shown in orange",
main="The amazing Prof. Ripley")
## mark the two R releases of 2007
abline(v=c(as.Date("2007-04-24"),as.Date("2007-06-28")),col='orange',lwd=1.5)
## and do a quick centered rolling median
lines(rollmedian(daily, 21, align="center"), lwd=3)
This extracts both date and time, creates a proper R time object (a so-called
POSIXct type) from it, fills a zoo ('the' magic class for time series) object
with it, uses zoo to aggregate commits per day and plots those in a
barchart-alike (I know, I know, ...) plot to which we add the two releases as
well as a rolling and centered three-week median (as a real quick hack rather
than a proper smooth).
This shows that Prof Ripley averaged about ten commits a day before and after the release of R 2.5.0, and that he has slowed down ever so slightly since then to end up at around a mere seven commits a day. Every day. For the seven-plus months we looked at. So, anyone for analysing his r-help posting frequencies ? Mon, 09 Jul 2007
Announcing CRANberries
The hope is that this proves helpful for keeping tabs on the amazing growth of CRAN (which is now at over one thousand packages) as well as the number of updates to existing packages. The feed(s) can be consumed standalone, or via the brand new Planet R aggregator that Elijah announced today too. Mon, 02 Jul 2007
More on 'nicer charts'
As some of my points didn't seem to make it across, I will reiterate them more plainly:
Sven also addresses the fact that what we really want is to see the quantiles
of the data set. Quite right, and taking logs makes that easier. Consider
the two charts below which plot the 'package age in days' as an empirical
cumulative distribution function using built-in R functions
While it is close to impossible to find the 25 or 50 percentile on the first
chart, it becomes a lot easier on the second chart because the x-axis is
'stretched' using the log transform. About one quarters of the distribution appears
to be rebuild within 1.5 months old, and about half is younger than four
months (as a quick call to
Improving simple charts
Lucas included a URL to the data. The first nice thing to note that we can read the data directly from the URL -- no need to copy the file:
pkgAge <- read.table(file="http://people.debian.org/~lucas/arch-age/arch-age.log", col.names=c("pkg","yyyymmdd"))
read the data into a data.frame which we have given two column names.
pkgAge[,"date"] <- as.Date(as.character(pkgAge[,"yyyymmdd"]), "%Y%m%d") pkgAge[,"age"] <- as.numeric(difftime(Sys.Date(), pkgAge[,"date"], units="day")) pkgAge[,"prop"] <- (1:nrow(pkgAge)) / nrow(pkgAge) * 100We then create three new columns. First is a date, by parsing the (integer) dates (after first casting them into characters) by supplying the format in standard C notation: "%Y%m%d" for year, date and month without
any separators or formatters. Now, having the date as an actual date
object inside a real data analysis language we can do
things as e.g. computing date differences. The difftime function
does just that, using the current date as other point. We ask for the return
to be in days, and cast this down to a purely numeric vector (instead of
datediff object). Lastly, we quickly compute the date proportion in
percentages.
We can then view the date. Before we plot,
png("packageAges.png", quality=100, width=640, height=480, pointsize=10)
oldpar <- par(mfrow=c(2,2), mar=c(2.5,2.5,3,1))
we direct the charts to a png file of given dimensions, and ask for all
plots in one figure (using mfrow with two rows by two) with
somewhat smaller figure margins using the mar argument to par.
The first chart shows again proportion over date: with(pkgAge, plot(date, prop, type='l', main="Standard Plot"))(The with() function simply allows us to refer to the columns by their names without explicit subsetting. plot(pkgAge[,"date",],
pkgAge[,"prop"]) is equivalent, but more cumbersome.)
As it clear that the data has a fairly long tail in the older dates, we can also try to plot the plot over logarithmic time differences. This doesn't work for dates, but it works for our (positive-valued) age variable: with(pkgAge, plot(age, prop, type='l', log="x", main="More linear as log(age in days)")) The very far left tail below 0.5 percent is interesting as the one very old package is clearly an outlier within an outlier region. We use the subset function to take just one portion of the data, use logs, and explicit plotting symbols '+' in a points-and-lines plot: with(subset(pkgAge, prop<0.5), plot(date, prop, type='b', log="y", pch="+", main="Detail in left tail, up to 0.5%")) Lastly, the upper quartile is fairly linear. with(subset(pkgAge, prop>75), plot(date, prop, type='l', pch=".", main="Yet fairly linear in top 25%")) At the end oldpar <- par(mfrow=c(2,3)) dev.off()we restore the graphics paramters and close the device (here the file). All this then yields the following chart:
Updated to correctly display the assignment operator
RGtk2 packages in Debian
For an two interesting applications using RGtk2, look at Graham William's rattle, a glade-based data-mining user-interface to several R functions, and at John Verzani's PMG (aka Poor Man's GUI) generic GUI for R. Thu, 02 Feb 2006
RGtk2 packages available
RGtk2 is, as the name suggests, an update of the older RGtk package (that I'd been maintaining in Debian as r-omegahat-rgtk) to the 'newer' version 2 of Gtk (aka The GIMP Toolkit). It provides Gtk goodies for the amazing R language and environment many of us dig for its use in statistical computing, visualization, data analysis, estimation and more. RGtk2 is quite an achievement. The source package is huge at 1.9mb, the resulting Debian package even huger at 5.3mb, and it all seems to work fine based on some initial tests of running the demos. Based on the two source packages, I created two packages of RGtk2 (as r-omegahat-rgtk2) and the Cairo device (as r-omegahat-cairodevice) which you can fetch from here. Not sure if I feel the urge to maintain them, but if someone else wants to step forward, let me know. Sources and diffs are in the same directory. Feedback welcome.
Update: By the way, what do we need to build with
First `r-devel' builds leading up to R 2.1.0
I have built an initial set of packages, mostly to convince myself that
nothing too drastic was needed inside the I tend not to install any locales packages on my machines, so I can't really test if the localisation support works -- configure and friends certainly suggest it based on the compile-time messages. So my dear reader, if you are into non-default locales and R, and have a minute, grab the package and try something like
$ LANGUAGE=de LC_all=de LC_MESSAGES=de LANG=de LOCALE=de R
R : Copyright 2005, The R Foundation for Statistical Computing
Version 2.1.0 Under development (unstable) (2005-02-27), ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for a HTML browser interface to help.
Type 'q()' to quit R.
> Sys.getlocale()
[1] "C"
> q("no")
As is plain from the above, I somehow failed to tell R about 'de' as an
alternative. However, capabilities() does report
TRUE for attribute iconv, so this should work
... Feedback welcome -- right now, R has po files for de and it so no need
to test other languages.
Mon, 21 Feb 2005
Nicer charts
In an effort of true R evangelism, I bugged Wouter about the gnuplot ugliness in those charts and offered an R script as an alternate. Per his reply, he seemed please with the output [1] -- click on the png for a nicer pdf:
For anybody interested, the
R code is available. It provides two simple functions. The first actually
creates the 4x1 chart. The second loops over all datafiles ending in
R CMD BATCH debian_bts_chart.Rwhich will create R CMD BATCH debian_bts_chart.Rout in the current
directory.
Lastly, I should note that I do think that the underlying data is wrongly classified. Counting a bug as 'active' even when it has been closed, but is not yet archived merely because the 28 days period hasn't passed is plain wrong. In my book, a closed bug cannot count as an active one. [1] This is my own data, and it shows the drop-off a few weeks ago when I passed maintainership of a good dozen packages on to others. Sun, 20 Feb 2005
Truly random numbers
Mads' service samples atmospheric noise -- see his background essay for more details --
which gets aggregated and can then be had via Corba, HTTP or SOAP. Given how R has such a wonderful (and probably
little know)
> X <- read.table(url(paste("http://www.random.org/cgi-bin/randnum",
"?num=10000&min=-1000000000&max=1000000000&col=2",
sep="")),
header=FALSE)
> plot(X, pch=".")
(The paste() is used to split the overly long line for the full
URL.) The arguments to the interface at random.org are hopefully
self-explanatory. Otherwise, full details are available.
As repeated simulations are often rather time-intensive, downloading random
sequences may not be the fasted way to go about things. However, this method
would provide a portable way to seed a pseudo
random number generator in a portable fashion for platforms that do not
have an entropy provider under
Three-character patch for Rpy to build under R 2.0.0
--- rpy-0.3.5.orig/setup.py
+++ rpy-0.3.5/setup.py
@@ -54,7 +54,7 @@
RHOME = get_R_HOME()
DEFINE.append(('R_HOME', '"%s"' %RHOME))
-r_libs = os.path.join(RHOME, 'bin')
+r_libs = os.path.join(RHOME, 'lib') # edd 11 Oct 2004: changed to 'lib' for 2.0.0
source_files = ["src/rpymodule.c", "src/R_eval.c",
"src/io.c"]
if sys.platform=='win32':
which may be useful to someone else trying to update RPy.
Update: Greg just told me by email that the fix is in CVS too, so a new RPy will have it. Thu, 07 Oct 2004
R 2.0.0 now in Debian
So after some further testing, packages of R 2.0.0 are now on the Debian
servers and should be in unstable tomorrow. Because of the new internal
interface to R packages, older packages will not load. Users can either call
New 'R in Finance' list up and running
Thanks again to everybody who participated in the finance sessions at the recent useR! 2004 conference. During the discussions, the idea of a mailing list for R and Finance came up. Thanks to Martin, such a list has now been created and can be accessed via the pageSubscriptions are trickling in at a steady rate, let's hope we get this list to be a helpful and lively forum. Wed, 26 May 2004
Slides for my useR! 2004 presentations are up
(And no, I cannot distribute the two packages described in the main talk as they were done at work where open source releases are not yet an accepted methodology. Hopefully one day.) Tue, 30 Dec 2003
Debian R Policy draft released
Accelerating plotOHLC by a few orders of magnitude
--- plotOHLC.R.orig 2003-12-14 12:02:20.000000000 -0600
+++ plotOHLC.R 2003-12-14 12:03:42.000000000 -0600
@@ -21,14 +21,9 @@
ylim <- range(x[is.finite(x)])
plot.new()
plot.window(xlim, ylim, ...)
- for (i in 1:NROW(x)) {
- segments(time.x[i], x[i, "High"], time.x[i], x[i, "Low"],
- col = col[1], bg = bg)
- segments(time.x[i] - dt, x[i, "Open"], time.x[i], x[i,
- "Open"], col = col[1], bg = bg)
- segments(time.x[i], x[i, "Close"], time.x[i] + dt, x[i,
- "Close"], col = col[1], bg = bg)
- }
+ segments(time.x, x[, "High"], time.x, x[, "Low"], col = col[1], bg = bg)
+ segments(time.x - dt, x[, "Open"], time.x, x[, "Open"], col = col[1], bg =$
+ segments(time.x, x[, "Close"], time.x + dt, x[, "Close"], col = col[1], bg$
if (ann)
title(main = main, xlab = xlab, ylab = ylab, ...)
if (axes) {
decrease the time spent on a series of ~500 points by a factor of sixty:
> IBM<-get.hist.quote("IBM", "2001-12-14")
trying URL
http://chart.yahoo.com/table.csv?s=IBM&a=11&b=13&c=2001&d=11&e=12&f=2003&g=d&q=$
Content type application/octet-stream' length unknown
opened URL
.......... .......... ...
downloaded 23Kb
time series starts 2001-12-12
time series ends 2003-12-11
> system.time(plotOHLC(IBM)) # original
[1] 1.56 0.26 5.11 0.00 0.00
> system.time(fastplotOHLC(IBM)) # patched
[1] 0.02 0.00 0.05 0.00 0.00
Fri, 12 Dec 2003
Small patch for Rpy and recent R versions
To make life easier for everybody, and as the patch is so simple, here it comes:
--- rpy-0.3.1.orig/src/RPy.h
+++ rpy-0.3.1/src/RPy.h
@@ -90,7 +90,8 @@
PyOS_sighandler_t python_sigint;
/* R function for jumping to toplevel context */
-extern void jump_now(void);
+/* extern void jump_now(void); */
+extern void Rf_onintr(void);
/* Global interpreter */
PyInterpreterState *my_interp;
--- rpy-0.3.1.orig/src/R_eval.c
+++ rpy-0.3.1/src/R_eval.c
@@ -65,7 +65,8 @@
void interrupt_R(int signum)
{
interrupted = 1;
- jump_now();
+ /* jump_now(); */
+ Rf_onintr();
}
Thanks to Luke Tierney for the hint regarding
Announcement for useR! 2004 is out
|
|||||