So what happens when we throw the compiler into the mix? Let's first create compiled variants using the new cmpfun() function and then try again:
Before I close, two more public service announcements. First, if you use Ubuntu see
this post by Michael on r-sig-debian
announcing his implementation of a suggestion of mine: we now have R alpha/beta/rc builds via his Launchpad PPA. Last Friday, I had the
current R-rc snapshot of R 2.13.0 on my Ubuntu box only about six hours after I (as Debian maintainer for R) uploaded the underlying new
R-rc package build to Debian unstable. This will be nice for testing of upcoming releases. Second, as I mentioned, the
Rcpp workshop
on April 28 preceding
R/Finance 2011 on April 29 and 30 still has a few slots
available, as has the conference itself.
More details are below in the updated Call for Papers. Please feel free to
re-circulate this Call for Papers with collegues, students and other
associations.
Complete papers or one-page abstracts (in txt or pdf format) are invited to
be submitted for consideration. Academic and practitioner proposals related
to R are encouraged. We welcome submissions for full talks, abbreviated
lightning talks, and for a limited number of pre-conference (longer)
seminar sessions.
Presenters are strongly encouraged to provide working R code to accompany the
presentation/paper. Data sets should also be made public for the purposes of
reproducibility (though we realize this may be limited due to contracts with
data vendors). Preference may be given to presenters who have released R
packages.
The conference will award two $1000 prizes for best paper: one for best
practitioner-oriented paper and one for best academic-oriented paper.
Further, to defray costs for graduate students, two travel and expense grants
of up to $500 each will be awarded to graduate students whose papers are
accepted. To be eligible, a submission must be a full paper; extended
abstracts are not eligible.
The submission deadline is February 15th, 2011. Early submissions may
receive early acceptance and scheduling. The graduate student grant winners
will be notified by February 23rd, 2011.
Submissions will be evaluated and submitters notified via email on a rolling
basis. Determination of whether a presentation will be a long presentation or
a lightning talk will be made once the full list of presenters is known.
The remainder of the weekend was nice too (with the notably exception of the extremly sucky
weather). We got to to spend some time at the
Google Summer of Code Mentor Summit which is always a fun
event and a great way to meet other open source folks in person. And we also took one
afternoon off to spend some with John Chambers discussing further work involving
Rcpp and the new ReferenceClasses that
appeared in the just-released R version 2.12.0. This should be a nice avenue to
further integrate R and C++ in the near future.
Call for Papers:
R/Finance 2011: Applied Finance with R
April 29 and 30, 2011
Chicago, IL, USA
The third annual R/Finance conference for applied finance using R will be
held this spring in Chicago, IL, USA on April 29 and 30, 2011. The two-day
conference will cover topics including portfolio management, time series
analysis, advanced risk tools, high-performance computing, market
microstructure and econometrics. All will be discussed within the context of
using R as a primary tool for financial risk management, portfolio
construction, and trading.
One-page abstracts or complete papers (in txt or pdf format) are invited to
be submitted for consideration. Academic and practitioner proposals related
to R are encouraged. We welcome submissions for full talks, abbreviated
"lightning talks", and for a limited number of pre-conference (longer)
seminar sessions.
Presenters are strongly encouraged to provide working R code to accompany the
presentation/paper. Data sets should also be made public for the purposes of
reproducibility (though we realize this may be limited due to contracts with
data vendors). Preference may be given to presenters who have released R
packages.
Please send submissions to: committee at RinFinance.com.
The submission deadline is February 15th, 2011. Early submissions may receive
early acceptance and scheduling.
Submissions will be evaluated and submitters notified via email on a rolling
basis. Determination of whether a presentation will be a long presentation or
a lightning talk will be made once the full list of presenters is known.
R/Finance
2009 and
2010
included attendees from around the world and featured
keynote presentations from prominent academics and practitioners. 2009-2010
presenters names and presentations are online at the conference website. We
anticipate another exciting line-up for 2011 including keynote presentations
from John Bollinger, Mebane Faber, Stefano Iacus, and Louis Kates. Additional
details will be announced via the conference website
as they become available.
For the program committee:
Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich
So see you in Chicago in April!
/computers/R |
permanent link
Straight, curly, or compiled?
Christian Robert, whose blog I
commented-on here once before, had
followed up on a recent set of posts by Radford Neal which had appeared both on Radford's blog and on the
r-devel mailing list.
Now, let me prefix this by saying that I really enjoyed Radford's posts. He obviously put a lot of time into finding a number of (all somewhat
small in isolation) inefficiencies in R which, when taken together, can make a difference in
performance. I already spotted one commit by Duncan in the SVN logs for R so this is being looked at.
Yet Christian, on the other hand, goes a little overboard in bemoaning performance differences somewhere between ten and fifteen percent -- the
difference between curly and straight braces (as noticed in Radford's first post). Maybe he spent too much time waiting for his MCMC runs to
finish to realize the obvious: compiled code is evidently much faster.
And before everybody goes and moans and groans that that is hard, allow me to just interject and note that it is not. It really
doesn't have to be. Here is a quick
cleaned up version of Christian's example code, with proper assigment operators and a second variable x. We then get to the
meat and potatoes and load our
Rcpp package as well as
inline to define the same little test function in C++. Throw in
rbenchmark which I am becoming increasingly fond of for these little timing tests,
et voila, we have ourselves a horserace:
# Xian's code, using <- for assignments and passing x down
f <- function(n, x=1) for (i in 1:n) x=1/(1+x)
g <- function(n, x=1) for (i in 1:n) x=(1/(1+x))
h <- function(n, x=1) for (i in 1:n) x=(1+x)^(-1)
j <- function(n, x=1) for (i in 1:n) x={1/{1+x}}
k <- function(n, x=1) for (i in 1:n) x=1/{1+x}
# now load some tools
library(Rcpp)
library(inline)
# and define our version in C++
l <- cxxfunction(signature(ns="integer", xs="numeric"),
'int n = as<int>(ns); double x=as<double>(xs);
for (int i=0; i<n; i++) x=1/(1+x);
return wrap(x); ',
plugin="Rcpp")
# more tools
library(rbenchmark)
# now run the benchmark
N <- 1e6
benchmark(f(N, 1), g(N, 1), h(N, 1), j(N, 1), k(N, 1), l(N, 1),
columns=c("test", "replications", "elapsed", "relative"),
order="relative", replications=10)
And how does it do? Well, glad you asked. On my i7, which the other three cores standing around and watching, we get an
eighty-fold increase relative to the best interpreted version:
/tmp$ Rscript xian.R
Loading required package: methods
test replications elapsed relative
6 l(N, 1) 10 0.122 1.000
5 k(N, 1) 10 9.880 80.984
1 f(N, 1) 10 9.978 81.787
4 j(N, 1) 10 11.293 92.566
2 g(N, 1) 10 12.027 98.582
3 h(N, 1) 10 15.372 126.000
/tmp$
So do we really want to spend time arguing about the ten and fifteen percent differences? Moore's law gets you
those gains in a couple of weeks anyway. I'd much rather have a conversation about how we can get people speed increases that are orders of
magnitude, not fractions. Rcpp is one such tool. Let's get more of them.
/computers/R |
permanent link
useR 2010 at NIST in Gaithersburg
This past week, the annual R user conference
useR! 2010 took place at the
National Institute of Standards and Technology (NIST) in Gaithersburg, MD
(which is a tad northwest of Washington, DC).
Kate Mullen and her team of local organizers did a truly tremendous job in putting
together a very smooth conference attended by almost 500 people. It is
always nice to meet so many other R contributors and users in person. And
needless to say it's also just plain fun to hang out with these folks.
As at the preceding
useR! 2008 in Dortmund and
useR! 2009 in Rennes, I presented a three-hour
tutorial on high-performance computing with R. This covers
scripting/automation, profiling, vectorisation, interfacing compiled code,
parallel computing and large-memory approaches. The slides, as well as a condensed 2-up version, are now on my
presentations
page.
On Wednesday, Romain and I had a chance
to talk about recent work on Rcpp,
our R and C++ integration. Thursday, we followed up with a presentation on
RProtoBuf --
a project integrating Google's
Protocol Buffers with R which much to our
delight already seems to be in use at Google itself! It was quite fun to do
these two talks jointly with Romain. But my other coauthor Khanh had to be at a
conference related to his actual PhD work. So on Friday it was just me to give
a presentation about
RQuantLib
which brings QuantLib to R.
Slides from all these talks have now been added to my presentations
page. I will also upload them via the conference form so that they can be part
of the conference's collection of presentations which should be forthcoming.
/computers/R |
permanent link
WU Wien presentations
Last week I had the opportunity to spend a few days at the
Institute for Statistics and Mathematics
of the WU Vienna / Wirtschaftsuniversitaet Wien.
On Thursday, I gave a seminar on
Rcpp and
RInside introducing
all the recent work with Romain on making R and C++ integration easier.
Both
(compact) handout and
(full) presentation
slides are now posted alongside the
other presentations.
On Friday, I also gave an informal lecture / tutorial / workshop to some of the
Stats and Finance Ph.D. students, drawing largely from the section on
parallel computing of the most recent
Introduction to High-Performance Computing with R tutorial.
My sincere thanks to Kurt Hornik and Stefan Theussl for the invite -- it was
a great trip, notwithstanding the mostly unseasonally cold and wet weather.
/computers/R |
permanent link
R / Finance 2010 presentations
Last Friday and Saturday the second R / Finance
conference took place in Chicago on the
UIC campus.
As a co-organizer, it was a great pleasure to see so many users of R in Finance---from both industry and academia---come to
Chicago to discuss and share recent work. There is a lot going on, and it is always good to exchange ideas with others sharing the same
infrastructure. Participants appeared to enjoy the conference. My thanks to everybody who helped to put it together, from the
local committee to the helping hands at UIC and of course the sponsors.
I just put my slides from the
Extending and Embedding R with C++
tutorial preceding the conference, as well as the RQuantLib: Interfacing QuantLin from R
presentation (with Khanh Nguyen), up onto my
presentations page. I do have a usb-drive with all conference
presentations and will provide them via the R / Finance site in a few days.
The only truly sour note is the fact that several presenters from Europe had their travels schedules turned
upside down by the disruption to international air travel caused by the Icelandic volcano eruption and the resulting ash clouds.
While we are glad to have had them for a little longer in Chicago, we understand that they are getting eager to return home. I
hope this extended stay in the Windy City does not take away from the overall usefulness of the trip.
/computers/R |
permanent link
Video of UCLA / LA RUG talk on R and C++ integration
Thanks to the efforts of the tireless R User Group organizers Szilard Pafka (in Los Angeles, recording
the talk) and Drew Conway (in New York, converting and organising hosting), there is now
a video and slide combo of my recent
talk
about Rcpp and RInside at UCLA and the Los Angeles R Users Group.
Thanks also to David Smith (at the REvolutions blog) and
Drew Conway (at his blog) for spreading
the word about the presentation video and slides -- quite a few folks have come to my
presentations page to get them.
/computers/R |
permanent link
UCLA and LA RUG talks on R and C++ integration
We spent last week in the LA area and had a generally good time out west. I
was able to sneak in two talks and a group discussion, thanks to the help by
Jan de Leeuw (and everybody at UCLA's Stats department) as well as by Szilard
Pafka representing the LA R User's Group. Pdf files for the slides for the talks
are now on my presentations
page in both a
compact handout and
presentation slide version
(where the content is identical; if in doubt use the first file).
The talks centered around R and C++ integration using both Rcpp and
RInside and
summarise where both projects stand after all the recent work
Romain and I put in over
the last few months. The presentations went fairly well; I received some
favourable comments.
Szilard and the R User Group had also suggested a group discussion about
CRAN, its growth and how to maximise
its usefulness. Given my CRANberries feed,
my work on the CRAN Task Views
for Empirical Finance and
High-Performance
Computing with R as well as our
cran2deb binary package
generator, I had some views and ideas that helped frame the discussion which
turned out to very useful and informed. So maybe we should do this User
Group thing in Chicago too!
Special thanks to Jan de Leeuw and Szilard Pafka for organising the meeting,
talks and discussion.
/computers/R |
permanent link
R and Sudoku solvers: Plus ca change...
Christian Robert
blogged
about a particularly heavy-handed solution to last Sunday's Sudoku puzzle in
Le Monde. That had my symapthy as I like evolutionary computing
methods, and his chart is rather pretty. From there, this spread on to the
REvolutions blogs
where David Smith riffed on it, and showed the acual puzzle. That didn't
stop things as Christian blogged
once more about it, this time welcoming
his post-doc Robin Ryder who
posts a heavy analysis
on all this that is a little much for me at this time of day.
But what everybody seems to be forgetting is that
R has had a Sudoku solver for years,
thanks to the sudoku
package by David Brahm and Greg Snow which was first posted four years
ago. What comes around, goes around.
With that, and about one minute of Emacs editing to get the Le Monde
puzzle into the required ascii-art form, all we need to do is this:
R> library(sudoku)
R> s <- readSudoku("/tmp/sudoku.txt")
R> s
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 8 0 0 0 0 1 2 0 0
[2,] 0 7 5 0 0 0 0 0 0
[3,] 0 0 0 0 5 0 0 6 4
[4,] 0 0 7 0 0 0 0 0 6
[5,] 9 0 0 7 0 0 0 0 0
[6,] 5 2 0 0 0 9 0 4 7
[7,] 2 3 1 0 0 0 0 0 0
[8,] 0 0 6 0 2 0 1 0 9
[9,] 0 0 0 0 0 0 0 0 0
R> system.time(solveSudoku(s))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 8 4 9 6 7 1 2 5 3
[2,] 6 7 5 2 4 3 9 1 8
[3,] 3 1 2 9 5 8 7 6 4
[4,] 1 8 7 4 3 2 5 9 6
[5,] 9 6 4 7 8 5 3 2 1
[6,] 5 2 3 1 6 9 8 4 7
[7,] 2 3 1 8 9 4 6 7 5
[8,] 4 5 6 3 2 7 1 8 9
[9,] 7 9 8 5 1 6 4 3 2
user system elapsed
5.288 0.004 5.951
R>
That took all of five seconds while my computer was also compiling a
particularly resource-hungry C++ package....
Just in case we needed another illustration that it is hard to navigate the
riches and wonders that is CRAN...
/computers/R |
permanent link
U of C ACM talk
Fellow GSoC mentor
and local ACM masterminder Borja Sotomayor
had invited me a few months ago to give a talk at the
ACM chapter at the
University of Chicago.
Today was the day, and the slides from the 50-minutes talk on
R and extending R with
Rcpp are
now on my presentations page.
/computers/R |
permanent link
R / Finance 2010 Open for Registration
The annoucement below went out to R-SIG-Finance earlier today.
For information is as usual the the
R / Finance 2010 page:
Now open for registrations:
R / Finance 2010: Applied Finance with R
April 16 and 17, 2010
Chicago, IL, USA
The second annual R / Finance conference for applied finance using R,
the premier free software system for statistical computation and graphics,
will be held this spring in Chicago, IL, USA on Friday April 16 and
Saturday April 17.
Building on the success of the inaugural R / Finance 2009 event, this
two-day conference will cover topics as diverse as portfolio theory,
time-series analysis, as well as advanced risk tools, high-performance
computing, and econometrics. All will be discussed within the context of
using R as a primary tool for financial risk management and trading.
Invited keynote presentations by Bernhard Pfaff, Ralph Vince, Mark Wildi
and Achim Zeileis are complemented by over twenty talks (both full-length
and 'lightning') selected from the submissions. Four optional tutorials
are also offered on Friday April 16.
R / Finance 2010 is organized by a local group of R package authors and
community contributors, and hosted by the International Center for Futures
and Derivatives (ICFD) at the University of Illinois at Chicago.
Conference registration is now open. Special advanced registration pricing is
available, as well as discounted pricing for academic and student
registrations.
More details and registration information can be found at the website at
http://www.RinFinance.com
For the program committee:
Gib Bassett, Peter Carl, Dirk Eddelbuettel, John Miller,
Brian Peterson, Dale Rosenthal, Jeffrey Ryan
See you in Chicago in April!
/computers/R |
permanent link
Review of 'Computational Statistics: An Introduction to R' in JSS
Somehow missed during the the end-of-year switchover was the fact that my
review of Guenther Sawitzki's Computational Statistics: An Introduction to R
(CRC / Chapman \& Hall, 2009) is now up
on the Journal of Statistical Software website.
/computers/R |
permanent link
Updated slides for 'Introduction to HPC with R' (now with correct URLs)
This is an updated version of yesterday's post with corrected URLs -- by
copy-and-pasting I had still referenced the previous slides from UseR! 2009
in Rennes instead of last Friday's slides from the ISM presentation in Tokyo.
The presentations
page had the correct URLs, and this has been corrected below for this
re-post. My apologies!
As mentioned
yesterday,
I spent a few days last week in Japan as I had an opportunity to present the
Introduction to High-Performance Computing with R tutorial at the
Institute for Statistical Mathematics
in Tachikawa near Tokyo thanks to an invitation by
Junji Nakano.
An updated version of the presentations slides (with a few typos corrected)
is now available as is a
2-up handout version.
Compared to previous versions, and reflecting the fact that this was the
'all-day variant' of almost five hours of lectures, the following changes were made:
- the 'parallel computing' section was expanded further with discussion
of the recent R packages multicore, iterators, foreach, doNWS, doSNOW, doMPI;
-
a first discussion of GPU computing using the gputools package was added;
-
the section on 'out of memory computing' using ff, bigmemory and biglm (including
an example borrowed from Jay Emerson) reappeared in this longer version;
-
minor fixes and polishing throughout.
Comments and suggestions are, as always, appreciated.
/computers/R |
permanent link
R / Finance 2010 Call for Papers
Jeff sent the following while
I had connectivity issues and I hadn't gotten around to posting it here.
So without further ado, and given the success of our initial
R / Finance 2009 conference about
R in Finance, here is the call for papers for next spring:
Call for Papers:
R/Finance 2010: Applied Finance with R
April 16 and 17, 2010
Chicago, IL, USA
The second annual R/Finance conference for applied finance using R
will be held this spring in Chicago, IL, USA on April 16 and 17, 2010.
The two-day conference will cover topics including portfolio
management, time series analysis, advanced risk tools,
high-performance computing, market microstructure and econometrics.
All will be discussed within the context of using R as a primary tool
for financial risk management and trading.
One-page abstracts or complete papers (in txt or pdf format) are
invited for consideration. Academic and practitioner research
proposals related to R are encouraged. We will accept submissions for
full talks, abbreviated "lightning talks", and a limited number of
pre-conference tutorial sessions. Please indicate with your
submission if you would be willing to produce a formal paper (10-15
pages) for a peer-reviewed conference proceedings publication.
Presenters are strongly encouraged to provide working R code to
accompany the presentation/paper. Data sets should also be made
public for the purposes of reproducibility (though we realize this may
be limited due to contracts with data vendors). Preference may be
given to presenters who have released R packages.
Please send submissions to: committee at RinFinance.com
The submission deadline is December 31st, 2009.
Submissions will be evaluated and submitters notified via email on a
rolling basis. Determination of whether a presentation will be a long
presentation or a lightning talk will be made once the full list of
presenters is known.
R/Finance 2009 included keynote presentations by Patrick Burns, Robert
Grossman, David Kane, Roger Koenker, David Ruppert, Diethelm Wuertz,
and Eric Zivot. Attendees included practitioners, academics, and
government officials. We anticipate another exciting line-up for 2010
and will announce details at the conference website
http://www.RinFinance.com as they become available.
For the program committee:
Gib Bassett, Peter Carl, Dirk Eddelbuettel, John Miller,
Brian Peterson, Dale Rosenthal, Jeffrey Ryan
See you in Chicago in April!
/computers/R |
permanent link
R for system administration and scripting
On several occassions, R had suggested
itself as a language for systems scripting. By this I mean random
little adminstrative task such as (re-)moving or maybe renaming files or
directories and the like.
One of such cases just happened a few minutes ago.
The aforementioned Garmin Forerunner 405
can cooperate quite nicely with Linux using the gant reader for the
ant wireless communication protocol between the usb hardware dongle and the
Garmin 405. (Sources for gant are both
this file and this
git archive.) I had meant
to blog about this tool and the resulting files one of these days anyway, but
today I just want to mention that the default filenames created by the program
were somewhat horrid such as 20.09.2009 101112.TCX to denote the 20th of
September of this year at 10:11h and 12 seconds. As we all know, filenames
with spaces are bad for the environment as well as plain annoying. So I had made
the simple change in the C sources to switch to a saner format such as
20090920-101112.TCX (and I see that the git archive now contains a
similar fix). But that still left me with some 80+ files with the dreaded
names.
There are of course many ways to skin this cat and to rename the files in
bulk. However, I found the following four lines to be fairly succinct
#!/usr/bin/r
files <- dir(".", pattern=".*\\.TCX$")
res <- lapply(files, function(f) {
pt <- strptime(f, "%d.%m.%Y %H%M%S.TCX") # parsed time
ft <- strftime(pt, "%Y%m%d-%H%M%S.TCX") # formatted time
file.rename(f, ft)
})
as they show, among other things,
- the access to one of the three (soon four) regexp engines, here as a
simple patterns argument to dir()
- the functional programming nature of the beast: files is a
vector of filenames, and lapply() unrolls the vector one-by-one
calling the anonymous function and passing the current element off as
f
- computing on times is particularly easy as we get strptime and
strftime as any self- and POSIX-respecting language should
- similarly, we get access to file system-level operations natively
avoiding all quoting issues that make files with spaces such fun in the
first place.
- the littler
scripting frontend providing /usr/bin/r rules.
So about five lines and two minutes later, some eighty-ish files were renamed
and sanity was restored. Hm, and I took me five times as long to blog this.
Lastly, I do not mean to imply that Python or Perl or Ruby or (insert
favourite tool here) cannot do it equally well. I simply meant to say that
programmatically creating new filenames is definitely easier in
R than it would have been in shell.
And as an added bonus, we even get fully parsed time objects that I could
have tested for. But then tests and documentation never get written on a
Saturday.
/computers/R |
permanent link
State of the Art in Parallel Computing with R: Now published
Our
survey paper on the current state of the art in parallel computing with R,
previously mentioned here as a
technical report
is now out as Vol 31, Issue 1
of the all-electronic
Journal of Statistical Software.
/computers/R |
permanent link
cran2deb: Would you like 1700+ new Debian / R packages ?
As I mentioned in my
quick write-up of UseR 2009, one of my talks was about cran2deb: a system
to turn (essentially) all CRAN packages into
directly apt-get-able binary packages.
This is essentially a '2.0' version of earlier work with Steffen Moeller and David
Vernazobres which we had presented in 2007. Then, the approach was top-down
and monolithic which started to show its limits. This time, the idea was to
borrow the successful bottom-up approach of my
CRANberries feed.
The bulk of the work was done by Charles Blundell as part of his Google Summer of Code
2008 project which I had suggested and mentored. After that project had
concluded, we both felt we should continue with it and bring it to
'production'. The CRAN hosts provided us with a (virtual Xen) machine to build
on, and we are now ready to more publically announce the availability of the
repositories for i386 and amd64:
deb http://debian.cran.r-project.org/cran2deb/debian-i386 testing/
and
deb http://debian.cran.r-project.org/cran2deb/debian-amd64 testing/
A few more details are provided in
our presentation slides.
We look forward to hearing from folks using; the r-sig-debian list
may be a good venue for this.
/computers/R |
permanent link
useR 2009 in Rennes: Recap and slides
I spent most of last week in Rennes, the capital of Brittany in France,
as it was time for
UseR! 2009,
the annual R conference. Francois Husson, Aline Legrand and others at the
Agrocampus Ouest had put together a really well-run conference, and it was a
pleasure to reconnect with so many people. It was also pretty nice to walk
around town and see bits and pieces of the
Tombees de la nuit festival
which happened at the same time.
As last year (and again at the BoC in December), I presented a three-hour
tutorial on high-performance computing with R. This covers profiling,
vectorisation, interfacing compiled code, debugging, parallel computing, as
well as scripting and automation. Slides, and a 2-up version, are now on my
presentations
page.
I also gave two regular conference presentations. The first was on my
Rcpp and
RInside packages
which facilitate interfacing R and C++. The second talk, based on joint work
with Charles Blundell, describes our cran2deb system for creating Debian
packages of essentially all CRAN
packages. I will try to follow up on this with another post. Slides from
these talks are also on my
presentations
page.
/computers/R |
permanent link
R 2.9.1, CRANberries outage, and missing Java support
Just a short note that version
2.9.1
of R was released yesterday. And a
corresponding Debian release
went out as usual on the
same day. One sour note: as the Java toolchain is
currently broken,
I had to disable compile-time support for Java. Just run R CMD javareconf once installed if you need it.
Speaking of broken, I had neither noticed that this R version now returns an
additional field (for the repository) in the per-package metadata via
available.packages(), nor that this change had broken my oh-so-useful
and increasingly popular
CRANberrries
html and rss summaries of CRAN changes. So with the
usual beta and rc releases
or R 2.9.1 in Debian starting a week prior, CRANberries had been silent for six days
from Friday the 21st to last Thursday. I rectified it once I noticed, and
changed the code to no longer fall on its nose at that spot. Sorry for the
few days without service.
/computers/R |
permanent link
Slides from most recent R and HPC tutorial
A little earlier I put the
slides
from my Introduction to High-Performance Computing with R tutorial
at the R /
Finance conference last week onto my
talks /
presentation page. Other tutorials, talks and keynotes are also being
posted on the conference program page.
This tutorial was a shorter format of just an hour which did not allow for any
parallel computing with R. However, parallel computing with R via MPI, snow,
nws, ... is covered in the slides from
December's
workshop at the BoC.
/computers/R |
permanent link
Review of 'Analysis of Integrated and Cointegrated Time Series with R (2nd ed)' in JSS
A few weeks ago I wrote up a short review of Bernhard Pfaff's nice (but
somewhat dry) Analysis of Integrated and Cointegrated Time Series with R
(2nd ed) on unit root and cointegration modeling with R. This is now online at the Journal of Statistical
Software.
/computers/R |
permanent link
R / Finance 2009
Our inaugural R / Finance conference,
mentioned
here
twice
is now over.
We were fortunate to get seven outstanding invited keynote speakers, as well as
eleven excellent presentations. This was preceded by four short tutorials (and
I'll post slides from my Introduction to High-Performance Computing with
R soon). With about 150 registered participants, plus keynoters,
presenters, committee members, representatives from the sponsors (a quick
shout of Thanks! to them), some folks from UIC (especially Holly
without whom few things would have happened), we were probably around 200 people gathered at
UIC. And then there was an extended social program at
Jaks which is rather appropriate as we had numerous
important committee meetings there over the preceding months. All in all it
seems like a successful event. We may even do it again.
/computers/R |
permanent link
Short introduction to R in Finance
Adam Gehr of DePaul University's Finance Department had organized a panel
session about R in Finance at the Midwest Finance Association's 58th Annual
Meeting which is happening this week here in Chicago.
I just posted my slides on my presentations
page. The slides give a brief overview of R, the CRAN network and the by now over 1600
packages, mention the Finance Task
View, briefly present four different packages (or package sets) and of
course beat the drum for our upcoming
R/Finance conference that will take place here in Chicago at the end of
next month.
/computers/R |
permanent link
Review of 'Applied Econometrics in R' in JSS
A short review of Kleiber and Zeileis' excellent Applied Econometrics
with R is now out at the (online)
Journal of Statistical Software.
/computers/R |
permanent link
R/Finance conference in Chicago in April: Registration now open
Regarding the
aforementioned
R/Finance conference that will take place at the end of April here in
Chicago, we announced earlier today that the
conference website
is now available.
It provides information about the program, speakers and other details as well
as a link to registration
details.
See you in Chicago in April!
/computers/R |
permanent link
Correct Datetime / POSIXct behaviour for R and kdb+
We have started to look into kdb+ as a possible
high-performance column-store backend. Kx offers
free trials
-- and so I have played with
this for a day or two, both the general system, data loads and dumps and in particular with
the interface to R,
Based on the few files (one C source with interface
code, one R file to access the C code, one object file to link against, one header
file and a simple Makefile), it took just a couple of minutes to turn this
into a proper CRAN-style R
package.
Anyway, the reason for this post was that the R / kdb+ glue code works well
... but not for datetimes. I really like to be able to pass date/time objects
natively between systems as easily as, say, numbers or strings (and see
e.g. my Rcpp package
for doing this with R and C++) and I was a bit annoyed when the millisecond
timestamps didn't move smoothly. Turns out that the basic converter function in the
code had a number of problems: it converted to integer, only covered a
single scalar rather than vectorised mode, and erroneously reduced a
reference count. A better version, in my view, is as follows:
static SEXP from_datetime_kobject(K x)
{
SEXP result;
int i, length = x->n;
if (scalar(x)) {
result = PROTECT(allocVector(REALSXP, 1));
REAL(result)[0] = (kF(x)[0] + 10957) * 86400;
} else {
result = PROTECT(allocVector(REALSXP, length));
for(i = 0; i < length; i++) {
REAL(result)[i] = (kF(x)[i] + 10957) * 86400;
}
}
SEXP datetimeclass = PROTECT(allocVector(STRSXP,2));
SET_STRING_ELT(datetimeclass, 0, mkChar("POSIXt"));
SET_STRING_ELT(datetimeclass, 1, mkChar("POSIXct"));
setAttrib(result, R_ClassSymbol, datetimeclass);
UNPROTECT(2);
return result;
}
This deals with vectors as well as scalars, converts Kdb's 'fractional days
since Jan 1, 2000' to the Unix standard of seconds since the epoch --
including the R extension of fractional seconds -- and as importantly, sets
the class attributes to POSIXt POSIXct as needed by R. With
that, a simple select max datetime from table does just that,
and vectors of timestamped records of trades or quotes or whatever also
come with proper POSIXct behaviour into R. Note that it needs TZ to be set to UTC, though,
or you get a timezone offset you may not want.
/computers/R |
permanent link
State-of-the-art in parallel computing with R: New paper
A few weeks ago, we finished a paper that surveys the current state of
parallel computing with R. The paper was lead by Markus Schmidberger and
written while he was visiting the Fred
Hutchinson Cancer Research Center in Seattle. The co-authors are Martin
Morgan, myself, Hao Yu, Luke Tierney and Ulrich Mansmann. The paper is now
available as a technical
report from LMU Munich via open access, and also from my papers page.
/computers/R |
permanent link
New CRAN Task View on HPC
A while back, I suggested to Achim to add a new
CRAN Task View
for High Performance Computing with
R. And as of a day or two ago,
we now have the new
CRAN Task View for High Performance Computing with R providing an
overview about available packages, grouped thematically, with a focus on the
various parallel computing application. I have already received a few great
comments that even lead to an entire new section on applications. Keep'em coming!
/computers/R |
permanent link
R featured in New York Times article
Today's New York Times carries a
decent article
about R. Predictably, this lead to
one (short),
two (longest),
three (short)
threads on the main R mailing list.
One aspect merits further highlighting. The reporter asked whether
R would pose a threat to SAS:
"I think it addresses a niche market for high-end data analysts that want free, readily available code,"
said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build
engines for aircraft. I am happy they are not using freeware when I get on a jet."
That's silly on so many levels. A concise and rather appropriate follow-up came in early from
Frank Harrell,
a long-time S and R advocate:
This is great to see. It's interesting that SAS Institute feels that
non-peer-reviewed software with hidden implementations of analytic
methods that cannot be reproduced by others should be trusted when
building aircraft engines.
Achim already added this (and two more posts
from the aforementioned threads) to the
fortunes package that collects such choice quotes.
R in Finance (the topic of
our upcoming conference)
gets mentioned as well. Now, as editor of the
Finance task view, I find that second half of
The financial services community has demonstrated a particular affinity for R;
dozens of packages exist for derivatives analysis alone.
to be a little off the mark. But that's minor as the article is broadly sympathetic, and mostly "gets
it" where it matters. Recommended.
/computers/R |
permanent link
R/Finance conference in Chicago in April: Call for Papers
The following went out to the
R-announce
and
R-SIG-Finance
mailing lists a few days ago. The conference already has a very strong
lineup of invited speakers, and we are now asking R / Finance users from both
academia and industry to submit suitable one-page abstracts:
Call for Papers
The Finance Department of the University of Illinois at Chicago (UIC),
the International Center for Futures and Derivatives at UIC, and
members of the R finance community are pleased to announce
R/Finance 2009: Applied Finance with R
on April 24 and 25, 2009, in Chicago, IL, USA
Confirmed keynote speakers include:
Patrick Burns (Burns Statistics)
David Kane (Kane Capital)
Roger Koenker (U of Illinois at Urbana/Champaign)
David Ruppert (Cornell)
Diethelm Wuertz (ETH Zuerich)
Eric Zivot (U of Washington)
We invite all users of R in Finance to submit one-page abstracts or
complete papers (in txt/pdf/doc format). We encourage papers both on
academic research topics and related to use of R by Finance practitioners.
Presenters are strongly encouraged to provide working R code to accompany
the presentation/paper. Datasets need not be made public.
Please send submissions to committee@RinFinance.com.
The submission deadline is January 31st, 2009.
Submissions will be evaluated and submitters notified via email
on a rolling basis.
Additional details about the conference will be announced as available.
For the program committee:
Gib Bassett, Peter Carl, Dirk Eddelbuettel, John Miller,
Brian Peterson, Dale Rosenthal, Jeffrey Ryan
See you in Chicago in April!
/computers/R |
permanent link
Updated 'Introduction to High-Performance Computing with R'
Fellow
R user Paul Gilbert had invited
me to come to Ottawa and the
Bank of Canada
to give a presentation/workshop on 'high-performance computing with R'
similar to the
UseR 2008 tutorial and
talk.
I just posted the
updated slides
from this talk, and there is also an updated live cdrom on the Alioth server. Also, it looks like the tutorial will be held again at
UseR 2009 in Rennes, see
here for a brief synopsis.
It was nice to get back to Canada, even if it was a 24 hour whirlwind trip. Ottaws looked quite pretty
in all the snow. And it seems that I got rather lucky with the travel dates as both the days before and after
my trip had a large number of flight cancellations and delays due to snow storms.
/computers/R |
permanent link
CRANberries prettified
Judging from my html logs, a fair number of folks go to the
html version
of my CRANberries feed (which was originally announced
here)
of new or updated packages for R,
So I quickly put together some simple css formatting to make it look a little
better than the default blosxom theme it
sported previously. That said, you probably should read the
rss version
(more about rss
here) anyway!
Update: Oops. And it even works with a correct path to the css file. Now fixed.
/computers/R |
permanent link
UseR! 2008 talk
Besides the
slides from the tutorial
at
UseR! 2008
that were
mentioned here previously,
I also gave a short talk on
scripting with R in high-performance computing using our
littler frontend to
R.
The talk introduces and extends an example related to some of the
material from the
tutorial
itself.
The slides from the talk
are a little rough as the talk was somewhat ad-hoc: As session chair, I was confronted with a
fairly last-minute cancellation and a 15 minute hole, and thought this would make a good little talk.
It does show a nice trick for using
littler
with
Open MPI (via
snow)
under the powerful
slurm
resource manager and batch/queue engine.
/computers/R |
permanent link
UseR! 2008 tutorial
Earlier today, I presented a 3 1/2 hour tutorial
Introduction to high-performance R
(here is a brief
description of the talk)
at the UseR! 2008
conference at the TU Dortmund.
In a nutshell, the tutorial covered
how to measure / profile R performance for
speed and memory use, how to accelerate R using vectorised expression and
tools like Ra / jit, how to add compiled code to R using either
the .C or .Call interface and using the
inline and RCpp packages, how to use R code in
parallel (explicitly using NWS, Rmpi or
snow as well as implicitly using pnmath / OpenMP),
and how to script / automate R using littler, Rscript
or RPy.
The final version of the slides is now available via my presentations page, and the live
cdrom with software support for all the software used is at Alioth.
Update: Corrected link to presentations page thanks to
heads-up by Charles. Thanks!
/computers/R |
permanent link
CRANberries updated
The good folks at CRAN, the
R package network which by now contains over
thirteen hundred packages, have reorganised the website slightly. As far as I can tell,
the changes are all for the better: nicer URLs, lots more per-package information,
and other changes such as an updated 'task view' format (which is the part I knew
via my maintenance of the Finance task view).
But these changes also affected my my CRANberries
(see the html or better yet
rss view) summaries of new packages
as some of the source information moved. So I just updated the (surprisingly short at 189 lines including plenty
of whitespace and comments) script, and things should work now come the next update.
While updating the 'more info' link for new and updates posts to point to the
new-style entry at CRAN, I also took the opportunity to update the format of the `blog' entry for
updates where we now show title and description along with the diffstat output,
I also manually copied in two of the recent entries: the new package
emu where CRANberries had fallen over as we
could not find the package description (in the new spot), and the existing package
GEOmap where diffstat failed
as we somehow didn't have a proper tarnall of the previous sources.
/computers/R |
permanent link
The amazing Prof. Ripley (cont'ed)
A little mini-meme got started on August 1 when
Ben Bolker
posted the following code
to the
r-devel list (and here I substituted the more standard '<-' assignment
operator for the less standard though-now permitted '='):
x <- readLines("http://developer.r-project.org/R.svnlog.2007")
rx <- x[grep("^r",x)]
who <- gsub(" ","",sapply(strsplit(rx,"\\|"),"[",2))
twho <- table(who)
twho["ripley"]/sum(twho)
In five lines (that could be shortened to three at the expense of some
readibility), the SVN log for R is
downloaded directly from the website, the revision authors are extraced and then
tabulated by submitter. The relative percentage of Brian Ripley is found
to be a staggering 74.8% -- or about three times as much as the other fifteen
committers combined. Smokes.
[ Oh, and for those who don't know him, he's also got a day job which presumably
entails looking after his graduate students at Oxford. Who knows, he may even
teach. Kidding aside, he's actually one of the nicest persons you'll ever
meet in real life. ]
Now yesterday, Simon Jackman who
had at first simply repeated Ben's analysis on his own blog followed up with a nice
analysis (albeit typeset in a way that rendered the code inoperational, which
has now been fixes) that creates both a histogram and a dotplot of commits
per hour of the day. Omitting Ben's code which Simon reuses, we have the
following for histogram and dotchart:
tod <- unlist(sapply(rx,function(x)strsplit(x,split=" ")[[1]][6]))
tod <- tod[who=="ripley"]
tz <- sub(pattern=".*(-[0-9]{4}).*",replacement="\\1",x=rx)
tz <- tz[who=="ripley"]
tz <- as.numeric(tz)/100
offset <- 3600*tz
z <- strptime(tod,format="%H:%M:%S")
hist(z,"hours",main="Ripley Commit Times in SVN TZ")
h <- z - offset
h <- format(h,format="%H")
h <- factor(as.numeric(h), levels=0:23)
dotchart(table(h), main="Ripley Commit Times, By Hour in GMT",
labels=paste(0:23,1:24,sep=":"))
This extracts the commit times, subsets to the ones by Prof. Ripley, extracts
the timezones component (as strptime seemingly doesn't do that
which is a pain), extracts the tz-less time via strptime into a
variable 'z' for which the histogram is drawn. He then corrects the times by
the tz offset expressed in seconds, formats is as hour of the day and turns
it into a 'factor' (an R data type for qualitative variables which may be
ordered as is the case here) and draws a dotplot. This results in the
following chart:
Now, nobody has looked at the time series. So we correct this and add the
following:
## rather extract both date and time
dat <- unlist(sapply(rx, function(x) {
txt <- strsplit(x,split=" ")[[1]]
paste(txt[5], txt[6])
}))
## subset on Prof Ripley
dat <- dat[who == "ripley"]
## and convert to POSIXct, correcting by tz as well
datpt <- as.POSIXct(strptime(dat,format="%Y-%m-%d %H:%M:%S")) - offset
## turn into zoo -- we use a constant series of ones as each
## committ is taken as a timestamped event
datzoo <- zoo(1, order.by=datpt)
## and use zoo to aggregate into commits per date
daily <- aggregate(datzoo, as.Date(index(datzoo)), sum)
## now plot as grey bars
plot(daily, col='darkgrey', type='h', lwd=2,
ylab="Nb of SVN commits, three-week median",
xlab="R release dates 2.5.0 and 2.5.1 shown in orange",
main="The amazing Prof. Ripley")
## mark the two R releases of 2007
abline(v=c(as.Date("2007-04-24"),as.Date("2007-06-28")),col='orange',lwd=1.5)
## and do a quick centered rolling median
lines(rollmedian(daily, 21, align="center"), lwd=3)
This extracts both date and time, creates a proper R time object (a so-called
POSIXct type) from it, fills a zoo ('the' magic class for time series) object
with it, uses zoo to aggregate commits per day and plots those in a
barchart-alike (I know, I know, ...) plot to which we add the two releases as
well as a rolling and centered three-week median (as a real quick hack rather
than a proper smooth).
This shows that Prof Ripley averaged about ten
commits a day before and after the release of R 2.5.0, and that he has slowed
down ever so slightly since then to end up at around a mere seven commits a
day. Every day. For the seven-plus months we looked at.
So, anyone for analysing his r-help posting frequencies ?
/computers/R |
permanent link
Announcing CRANberries
Earlier today I sent an
announcement to the r-packages list. It describes CRANberries, two simple RSS feeds
that summarize both 'new' and 'updated' packages at CRAN, the archive network for R.
I cooked this up rather quickly using a few lines of R, a small
SQLite db backend and the old Blosxom blog engine.
A tip of the hat to Barry Rowlingson who almost
immediately suggested to use
the
lol format instead.
The hope is that this proves helpful for keeping tabs on the amazing growth
of CRAN (which is now at over one
thousand packages) as well as the number of updates to existing packages.
The feed(s) can be consumed standalone, or via the brand new Planet R aggregator that Elijah announced
today too.
/computers/R |
permanent link
More on 'nicer charts'
Via the Planet Debian aggregator and
his blog, Sven
followed
up on my post
regarding
Lucas' plot of the package age distribution.
As some of my points didn't seem to make it across, I will reiterate them
more plainly:
- GNUplot, while easy to use, creates charts that aren't terribly
pretty;
- Lucas' original chart had, to paraphrase an expression by Tufte, a poor
'ink to paper ratio': the data is too concentrated in the last
quartile;
- for that very reason, taking logs is a good thing here
Sven also addresses the fact that what we really want is to see the quantiles
of the data set. Quite right, and taking logs makes that easier. Consider
the two charts below which plot the 'package age in days' as an empirical
cumulative distribution function using built-in R functions ecdf
and plot.stepfun (rather than
redoing it ad-hoc as I had done), and also add explicitly quantiles. The two
charts use the exact same instructions; however the second chart transforms
the x-axis to a logarithmic scale.
While it is close to impossible to find the 25 or 50 percentile on the first
chart, it becomes a lot easier on the second chart because the x-axis is
'stretched' using the log transform. About one quarters of the distribution appears
to be rebuild within 1.5 months old, and about half is younger than four
months (as a quick call to summary(pkgAge) confirms). Reading
these proprtions off the original chart, or the non-log chart, is much more difficult.
/computers/R |
permanent link
Improving simple charts
Earlier today and via Planet Debian,
Lucas blogged about the 'age distribution' of Debian
packages, defined as the time since the last (re-)compilation. He
illustrated his findings with an, umm, rather ugly chart. Having climbed onto the soap box once
before, I would like to point out how easy it can be to create
simple, informative, and, at to least to me, prettier charts using R.
Lucas included a URL to the data. The first nice thing to note that we can
read the data directly from the URL -- no need to copy the file:
pkgAge <- read.table(file="http://people.debian.org/~lucas/arch-age/arch-age.log", col.names=c("pkg","yyyymmdd"))
read the data into a data.frame which we have given two column names.
pkgAge[,"date"] <- as.Date(as.character(pkgAge[,"yyyymmdd"]), "%Y%m%d")
pkgAge[,"age"] <- as.numeric(difftime(Sys.Date(), pkgAge[,"date"], units="day"))
pkgAge[,"prop"] <- (1:nrow(pkgAge)) / nrow(pkgAge) * 100
We then create three new columns. First is a date, by parsing the (integer)
dates (after first casting them into characters) by supplying the format in
standard C notation: "%Y%m%d" for year, date and month without
any separators or formatters. Now, having the date as an actual date
object inside a real data analysis language we can do
things as e.g. computing date differences. The difftime function
does just that, using the current date as other point. We ask for the return
to be in days, and cast this down to a purely numeric vector (instead of
datediff object). Lastly, we quickly compute the date proportion in
percentages.
We can then view the date. Before we plot,
png("packageAges.png", quality=100, width=640, height=480, pointsize=10)
oldpar <- par(mfrow=c(2,2), mar=c(2.5,2.5,3,1))
we direct the charts to a png file of given dimensions, and ask for all
plots in one figure (using mfrow with two rows by two) with
somewhat smaller figure margins using the mar argument to par.
The first chart shows again proportion over date:
with(pkgAge, plot(date, prop, type='l', main="Standard Plot"))
(The with() function simply allows us to refer to the columns by their names
without explicit subsetting. plot(pkgAge[,"date",],
pkgAge[,"prop"]) is equivalent, but more cumbersome.)
As it clear that the data has a fairly long tail in the older dates, we can
also try to plot the plot over logarithmic time differences. This doesn't
work for dates, but it works for our (positive-valued) age variable:
with(pkgAge, plot(age, prop, type='l', log="x", main="More linear as log(age in days)"))
The very far left tail below 0.5 percent is interesting as the one very old
package is clearly an outlier within an outlier region. We use the subset
function to take just one portion of the data, use logs, and explicit
plotting symbols '+' in a points-and-lines plot:
with(subset(pkgAge, prop<0.5), plot(date, prop, type='b', log="y", pch="+", main="Detail in left tail, up to 0.5%"))
Lastly, the upper quartile is fairly linear.
with(subset(pkgAge, prop>75), plot(date, prop, type='l', pch=".", main="Yet fairly linear in top 25%"))
At the end
oldpar <- par(mfrow=c(2,3))
dev.off()
we restore the graphics paramters and close the device (here the file). All
this then yields the following chart:
Updated to correctly display the assignment operator <-
/computers/R |
permanent link
RGtk2 packages in Debian
As previously mentioned,
I had built local packages of Michael Lawrence's RGtk2
port of the Gtk v2 widgets / toolkit to the
R language and environment. This package, along with the
related cairoDevice package, is now in Debian's main archive. RGtk2 extends and replaces the
older r-omegahat-rgtk
package.
For an two interesting applications using RGtk2, look at Graham William's
rattle, a glade-based data-mining
user-interface to several R functions, and at John Verzani's
PMG (aka Poor Man's GUI) generic GUI for R.
/computers/R |
permanent link
RGtk2 packages available
Michael Lawrence announced the release of RGtk2 yesterday on both r-packages
and r-sig-gui.
This follows his impressive presentation at DSC 2005 and finally makes
the code available.
RGtk2 is, as the name suggests, an
update of the older RGtk package
(that I'd been maintaining in Debian as r-omegahat-rgtk)
to the 'newer' version 2 of Gtk (aka The
GIMP Toolkit). It provides Gtk goodies for
the amazing R language and environment
many of us dig for its use in statistical computing, visualization, data
analysis, estimation and more.
RGtk2 is quite an achievement. The
source package is huge at 1.9mb, the resulting Debian package even huger at
5.3mb, and it all seems to work fine based on some initial tests of running
the demos. Based on the two source packages, I created two packages of RGtk2
(as r-omegahat-rgtk2) and the Cairo device (as r-omegahat-cairodevice) which
you can fetch from
here.
Not sure if I feel the urge to maintain them, but if someone else wants to
step forward, let me know. Sources and diffs are in the same
directory. Feedback welcome.
Update: By the way, what do we need to build with
gtkmozembed? I didn't find a proper Build-Depends: for this.
/computers/R |
permanent link
First `r-devel' builds leading up to R 2.1.0
As usual, the next GNU R release
(expected in April) will bring yet another set of changes. Among the very
user-visible changes are initial localisation support, and a disappearance of
the (never completed) Gnome front-end.
I have built an initial set of packages, mostly to convince myself that
nothing too drastic was needed inside the debian/ directory.
These packages are for now here on my box but I
guess I could upload them to Debian's experimental distribution for which the
changelog entry is tagged.
I tend not to install any locales packages on my machines, so I can't really
test if the localisation support works -- configure and friends certainly
suggest it based on the compile-time messages. So my dear reader, if you are
into non-default locales and R, and have a minute, grab the package and try
something like
$ LANGUAGE=de LC_all=de LC_MESSAGES=de LANG=de LOCALE=de R
R : Copyright 2005, The R Foundation for Statistical Computing
Version 2.1.0 Under development (unstable) (2005-02-27), ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for a HTML browser interface to help.
Type 'q()' to quit R.
> Sys.getlocale()
[1] "C"
> q("no")
As is plain from the above, I somehow failed to tell R about 'de' as an
alternative. However, capabilities() does report
TRUE for attribute iconv, so this should work
... Feedback welcome -- right now, R has po files for de and it so no need
to test other languages.
/computers/R |
permanent link
Nicer charts
Wouter had blogged about his
charts
based on
Takuo's per-maintainer
statistics from the
Debian BTS.
In an effort of true R evangelism, I
bugged Wouter about the gnuplot ugliness in those charts and offered an R
script as an alternate. Per his reply, he seemed please with the output [1]
-- click on the png for a nicer pdf:
For anybody interested, the
R code is available. It provides two simple functions. The first actually
creates the 4x1 chart. The second loops over all datafiles ending in
.csv in a given directory, assuming the filename before the
.csv ending provides the unique identifier (here the maintainer
name and email as per Takuo's and Wouter's setup). On a per file basis, data
is loaded, and a pdf and png are produced. This can be called as in
R CMD BATCH debian_bts_chart.R
which will create R CMD BATCH debian_bts_chart.Rout in the current
directory.
Lastly, I should note that I do think that the underlying data is wrongly
classified. Counting a bug as 'active' even when it has been closed, but is
not yet archived merely because the 28 days period hasn't passed is plain
wrong. In my book, a closed bug cannot count as an active one.
[1] This is my own data, and it shows the drop-off a few weeks ago when I
passed maintainership of a good dozen packages on to others.
/computers/R |
permanent link
Truly random numbers
The neat random.org service started by Mads Haahr a few years ago supplies
truly random numbers. This is not the place to get into details
about why pseudo
and quasi (aka
low-discrepancy sequences) random numbers are only 'random'.
Mads' service samples atmospheric noise -- see his background essay for more details --
which gets aggregated and can then be had via Corba, HTTP or SOAP. Given how R has such a wonderful (and probably
little know) url() function to aquire data over the web, I
figured it might be worthwhile to show how R can acquire truly random
numbers. This snipped downloads a 10,000 x 2 vector, and plots it:
> X <- read.table(url(paste("http://www.random.org/cgi-bin/randnum",
"?num=10000&min=-1000000000&max=1000000000&col=2",
sep="")),
header=FALSE)
> plot(X, pch=".")
(The paste() is used to split the overly long line for the full
URL.) The arguments to the interface at random.org are hopefully
self-explanatory. Otherwise, full details are available.
As repeated simulations are often rather time-intensive, downloading random
sequences may not be the fasted way to go about things. However, this method
would provide a portable way to seed a pseudo
random number generator in a portable fashion for platforms that do not
have an entropy provider under /dev.
/computers/R |
permanent link
Three-character patch for Rpy to build under R 2.0.0
As we
mentioned here before, R 2.0.0 is out and in Debian. Graham reminded
us about the need for a fresh version of Rpy. It turns out that a minor patch
is needed to adjust for the new location of libR.so:
--- rpy-0.3.5.orig/setup.py
+++ rpy-0.3.5/setup.py
@@ -54,7 +54,7 @@
RHOME = get_R_HOME()
DEFINE.append(('R_HOME', '"%s"' %RHOME))
-r_libs = os.path.join(RHOME, 'bin')
+r_libs = os.path.join(RHOME, 'lib') # edd 11 Oct 2004: changed to 'lib' for 2.0.0
source_files = ["src/rpymodule.c", "src/R_eval.c",
"src/io.c"]
if sys.platform=='win32':
which may be useful to someone else trying to update RPy.
Update: Greg just told me by email that the fix is in CVS too, so a
new RPy will have it.
/computers/R |
permanent link
R 2.0.0 now in Debian
It's this time of year again, and the bi-annual GNU R release brings in a wider than
usual list of
changes as indicated by the new major release version 2.0.0. One change
in particular, lazy loading, promises to be most useful, if only for faster
startup times.. Unfortunately, the internal code changes also triggered two
ugly bugs that seemed to appear only in the pbuilder chroot builds I
made. After a couple of frustrating days spent trying different things, Brian
Ripley kindly supplied a patch earlier today. This allows us build R, and the
core 'recommended' packages, in the chroot environment used by the Debian
autobuilders.
So after some further testing, packages of R 2.0.0 are now on the Debian
servers and should be in unstable tomorrow. Because of the new internal
interface to R packages, older packages will not load. Users can either call
update.packages() directly, or wait a few days until we (as in
Chris, Doug, Rafael, Steffen and myself) get the various
r-(cran|bioc|omegahat|other|noncran)-* packages rebuilt under 2.0.0. We will
try our best to have them all updated by Sunday.
/computers/R |
permanent link
New 'R in Finance' list up and running
Yesterday, I sent this note
to the r-help list:
Thanks again to everybody who participated in the finance sessions at the
recent useR! 2004 conference. During the discussions, the idea of a mailing
list for R and Finance came up. Thanks to Martin, such a list has now been
created and can be accessed via the page
https://www.stat.math.ethz.ch/mailman/listinfo/r-sig-finance
from which subscription requests can be made using the usual confirmation
system employed by the mailman software. Everybody interested in 'Finance'
(we will try not to be too picky regarding definitions) and R us cordially
invited to subscribe. Also feel free to forward this message to interested
colleagues.
Subscriptions are trickling in at a steady rate, let's hope we get this list to
be a helpful and lively forum.
/computers/R |
permanent link
Slides for my useR! 2004 presentations are up
Now back from the excellent
useR! 2004,
the first
international R User Conference in Vienna, I have put
the slides for my talk Programming
with financial data: Connecting R to Lim and Bloomberg, as well the two
shorter ones Quantian:
A single-system image scientifc cluster programming environment, and
R on Debian:
Past, Present, Future (joint with Doug Bates and Albrecht Gebhardt) up on
my website.
(And no, I cannot distribute the two packages described in the main talk as
they were done at work where open source releases are not yet an accepted
methodology. Hopefully one day.)
/computers/R |
permanent link
Debian R Policy draft released
We just posted a draft with requests for comments to the Debian and R
developer lists. Comments welcome!
/computers/R |
permanent link
Accelerating plotOHLC by a few orders of magnitude
The plotOHLC function in the tseries package is useful to plot timeseries of
various financial assets with open/high/low/close data. I had often
wondered if it could be made to run a little faster. It turns out that the
following patch does
--- plotOHLC.R.orig 2003-12-14 12:02:20.000000000 -0600
+++ plotOHLC.R 2003-12-14 12:03:42.000000000 -0600
@@ -21,14 +21,9 @@
ylim <- range(x[is.finite(x)])
plot.new()
plot.window(xlim, ylim, ...)
- for (i in 1:NROW(x)) {
- segments(time.x[i], x[i, "High"], time.x[i], x[i, "Low"],
- col = col[1], bg = bg)
- segments(time.x[i] - dt, x[i, "Open"], time.x[i], x[i,
- "Open"], col = col[1], bg = bg)
- segments(time.x[i], x[i, "Close"], time.x[i] + dt, x[i,
- "Close"], col = col[1], bg = bg)
- }
+ segments(time.x, x[, "High"], time.x, x[, "Low"], col = col[1], bg = bg)
+ segments(time.x - dt, x[, "Open"], time.x, x[, "Open"], col = col[1], bg =$
+ segments(time.x, x[, "Close"], time.x + dt, x[, "Close"], col = col[1], bg$
if (ann)
title(main = main, xlab = xlab, ylab = ylab, ...)
if (axes) {
decrease the time spent on a series of ~500 points by a factor of sixty:
> IBM<-get.hist.quote("IBM", "2001-12-14")
trying URL
http://chart.yahoo.com/table.csv?s=IBM&a=11&b=13&c=2001&d=11&e=12&f=2003&g=d&q=$
Content type application/octet-stream' length unknown
opened URL
.......... .......... ...
downloaded 23Kb
time series starts 2001-12-12
time series ends 2003-12-11
> system.time(plotOHLC(IBM)) # original
[1] 1.56 0.26 5.11 0.00 0.00
> system.time(fastplotOHLC(IBM)) # patched
[1] 0.02 0.00 0.05 0.00 0.00
/computers/R |
permanent link
Small patch for Rpy and recent R versions
After I
asked on R-devel
about the failure of Rpy to built with more recent
versions of GNU R, I every now and then get a
request for the patch mentioned in my
follow-up on on
R-devel.
To make life easier for everybody, and as the patch is so simple, here it comes:
--- rpy-0.3.1.orig/src/RPy.h
+++ rpy-0.3.1/src/RPy.h
@@ -90,7 +90,8 @@
PyOS_sighandler_t python_sigint;
/* R function for jumping to toplevel context */
-extern void jump_now(void);
+/* extern void jump_now(void); */
+extern void Rf_onintr(void);
/* Global interpreter */
PyInterpreterState *my_interp;
--- rpy-0.3.1.orig/src/R_eval.c
+++ rpy-0.3.1/src/R_eval.c
@@ -65,7 +65,8 @@
void interrupt_R(int signum)
{
interrupted = 1;
- jump_now();
+ /* jump_now(); */
+ Rf_onintr();
}
Thanks to Luke Tierney for the hint regarding Rf_onintr(). If you're into Debian, I
also have a local package I could sent you. I'd upload it, but I don't really want to be stuck
maintaining it when Rpy looks as if it has been orphaned by its original author.
/computers/R |
permanent link