Dirk Eddelbuettel Thinking inside the box
 
Tue, 02 Dec 2008

Rcpp 0.6.2
Hot on the heels of the announcement for Rcpp I just made, version 0.6.2 follows with a small but important fix for the default linker flag output by the helper function Rcpp:::RcppLdFlags().

I also added a download section to the Rcpp page here where you can find the new version. CRAN should start propagating it by tomorrow.

/computers/linux/debian/packages | permanent link

Rcpp relaunched with versions 0.6.0 and 0.6.1
I just announced Rcpp 0.6.0 and 0.6.1 on the low-volume R-packages list. Rcpp provides C++ classes that greatly facilitate interfacing C or C++ code in R packages using the .Call() interface provided by R.

Rcpp provides matching C++ classes for a large number of basic R data types. Hence, a package author can keep his data in normal R data structure without having to worry about translation or transfer to C++. At the same time, the data structures can be accessed as easily at the C++ level, and used in the normal manner.

The mapping of data types works in both directions. It is as straightforward to pass data from R to C++, as it is it return data from C++ to R.

Rcpp was initially written by Dominick Samperi to in the context of the RQuantLib package and later released on its own, but had not seen any releases in twenty-four months. I have substantially expanded the documentation, simplified the build structure yet made it easier to use Rcpp from other packages, and started to add some new classes (notably microsecond time types). Rcpp is supported on Windows, Linux and Mac OS X (with special thanks to Simon for some extended help).

More information for Rcpp can be found at the package homepage, the R-forge repository or the package CRAN page.

/computers/linux/debian/packages | permanent link

Mon, 01 Dec 2008

CRANberries prettified
Judging from my html logs, a fair number of folks go to the html version of my CRANberries feed (which was originally announced here) of new or updated packages for R,

So I quickly put together some simple css formatting to make it look a little better than the default blosxom theme it sported previously. That said, you probably should read the rss version (more about rss here) anyway!

Update: Oops. And it even works with a correct path to the css file. Now fixed.

/computers/R | permanent link

Mon, 03 Nov 2008

Multiseat update under Ubuntu 08.10
The aforementioned multi-seat setup allowing two kids with two screens/keyboards/mice connected to one computer required a minor update under the new Ubuntu release. The Xephyr 'x11-server inside an x11-server' that is used to display two distinct sessions on two distinct monitors didn't seem to recognise the mice and keyboards anymore.

Somehow not explicitly specifying them helped. I.e. the calls to the script /usr/sbin/Xephyr-path.sh from /etc/gdm/gdm.conf now read

[server-Xephyr1]
name=Xephyr1
command=/usr/sbin/Xephyr-path.sh -display :0 -br -dpi 100 -xauthority /var/lib/gdm/:0.Xauth -screen 1280x1024
handled=true
flexible=false

[server-Xephyr2]
name=Xephyr2
command=/usr/sbin/Xephyr-path.sh -display :0 -br -dpi 100 -xauthority /var/lib/gdm/:0.Xauth -screen 1280x1024+1280+0
handled=true
flexible=false
Otherwise, the tutorial referenced in my earlier post still applies. And the kids are very impressed with new eye candy in KDE 4.1.

/computers/hardware | permanent link

Tue, 28 Oct 2008

Google Summer of Code 2008 Mentors Summit
Spent last weekend in Mountain View where Google had invited a number of mentors for the Summer of Code project that Google once again graciously sponsored. A rather impressive list of projects sent up to two people each, giving a probably unparalled sample of major Open Source projects.

I had a blast. Chris, Leslie and the rest of the Google's Open Source Programs Office facilitated a really nice unconference that spawned a few really nice sessions, and they took very good care of us. And just about everybody met a number of folks in person that were previously known only via email or irc. As the saying goes: nothing like the bandwidth of a face-to-face meeting...

Last but not least I should issue a health warning. Sharing a room with the fearless Debian DPL is not for the faint of heart: His snooring is truly world-class.

/computers/misc | permanent link

Tue, 14 Oct 2008

RPostgreSQL 0.1.0
As part of the Google Summer of Code program for 2008 (which I mentioned here and here), Sameer and I are happy to announce that RPostgreSQL is now on the CRAN mirror network for R. RPostgreSQL provides a (DBI-compliant) interface between R and the Postgresql database system. I also just sent this short announcement to the r-packages list.

/computers/misc | permanent link

Thu, 09 Oct 2008

More running data visualization
A few years into this running hobby, I realized that my times were getting better. But I had no feel for by how much, or whether that was a constant rate of improvement etc pp. Long story short, I started to plot some of the data. What seemed natural was to record the date, the distance in miles as well as in a qualitative variable, and finally the average pace. Additionally, I played with groupings into just three categories 'short', 'mid' and 'long'.

This leads to a natural 'one-factor' model of pace as a function of race date grouped by race distance. And given how easy it is to do conditional plots in R, I quickly arrived at something that already resembled the following chart:

(pace by date given group lattice chart)

At first, some of the groups had too few data points to actually reliably construct regression lines, let alone non-parametric smoothers. But over time more and more data points were added as I kept running races. Including for example the somewhat disappointing result from last year's Chicago marathon in record heat that resulted in the outlier in the last panel. It actually made the smooth fit turn upwards! Luckily, the subsequent times in New York last fall, London in April, and of course in Berlin last month helped to dampen the effect of the one outlier, resulting in a more normal straight line for marathon performance that is comparable to the other four race lengths.

All in all I am now quite happy with the chart. The combination of the non-parametric loess smoother and the robust linear regression (using lrm from the MASS package for R) shows that most groups exhibit very little non-linearity as both regression curves are very close to each other. The curvature in the '10m' group is probably mostly a small-sample effect. And I am obviously happy with the fact that three of the five panels show their respective last race as a PR :)

The R script containing the data and code is available here but requires some familiarity with the lattice package for R (as the lattice book would provide).

/sports/running | permanent link

Sun, 05 Oct 2008

World Marathon Majors
Last Sunday's Berlin Marathon was my fifth and final piece in completing the World Marathon Majors during 2007/2008: after running Boston, Chicago and New York in 2007, and then London and now Berlin in 2008, the set is complete.

The idea was born after having run Chicago a few times, qualifying for Boston and winning a New York lottery entry. With friends to visit in New York, London and Berlin, it became feasible.

It's been a great experience to run these famous courses in front of large crowds. Conditions ranged from cold, windy and rainy in Boston to way too hot in Chicago, had mixed conditions including a solid rain shower in London and were just perfect in both New York and Berlin. The crowds were awesome in all five places. All in all, these races were a blast -- if you're into long-distance running, give each or all of them a shot.

/sports/running | permanent link

Wed, 01 Oct 2008

Berlin Marathon 2008
Last Sunday was the 35th Berlin Marathon. I had flown over to Berlin on Thursday after work, and had Friday and Saturday to 'chill'. The weather was already pretty nice before the race, and truly gorgeous on Sunday: sunny yet not too warm, blue skies, no wind. As has been widely reported, Haile Gebrselassie set a new world record breaking his own mark set the year before and becoming the first man to finish under two hours and four minutes. Truly impressive.

My race was pretty good too. I shaved over four and a half minutes off my own personal record (which was set in early 2006 at Sunburst) and finished in 3:13:09. That's a pace of 7:22 min/mile (or 4:35 min/km) which I am rather happy with. I held a fairly steady pace of under 7:30 almost all the way but but had to fight off the onset of cramps with some short walks about less than two miles to go.

Coming back in Berlin after all those years is always a charm. The city has obviously changed a lot in some very visible areas. Yet it still recalls the Berlin of those years. The course was really nice, covering numerous neighbourhoods and starting and ending in Tiergarten.

Lastly, it was also good to see old friends who have now been there since the mid- to late 1980s. And I managed to pack a quick visit to my parents in as they are just a good 80 minute ICE train ride away. All in all a very nice trip even though the travel from Chicago (without a direct flight!) is a bit of a hike.

/sports/running | permanent link

Sun, 14 Sep 2008

Chicago Half Marathon 2008
Interesting conditions today for the 2008 edition of the Chicago Half Marathon, a race I have now done in 2003, 2004, 2005, 2006, and 2007 which is a personal record in itself.

While the weather story of the weekend is obviously the aftermath of hurricane Ike in Texas and neighbouring states were millions of people are still without power, we were also hit in a surprisingly hard way here in northwestern Illinois. According to the Tribune all of Chicago had a rain record day and the Chicago River crested causing evacuations. Not pretty.

The new race organisers (who had acquired the race since the 2007 event) were standing steadfast and guaranteeing the race 'come rain or shine'. Participation looked decent -- word was of a record turnout of sixteen thousand runners though I am sure some stayed home given yesterday's rain and the forecast for today. Given all that, it turned out to be not that bad. While we had steady rain the whole, it rarely rained that hard. Shoes and socks did get wet towards the end, but it was tolerable overall. I had been worried about the gross humidity we had yesterday --- but today was much better with temperatures in the sixties and little wind.

As for the race, I went out somewhat fast but managed to hang on. The Garmin had every mile split below 7:00 min/mile, and I came in at a new personal record of 1:30:51.52. My GPS, an old Garmin 201, also showed the course long at 13.4 miles; a few other runners I talked to had it as correct or long by a lesser amount. The leaves the pace at 6:56 min/mile (or, for Christian, at 4:19 min/km :-) if the half marathon course length was in fact correct, and at 6:47 min/mile (4:13 min/km) if my Garmin had it right.

And from now on it's all tapering for the the next big one in two weeks!

/sports/running | permanent link

Wed, 10 Sep 2008

RDieHarder 0.1.0 released
I just rolled up version 0.1.0 of RDieHarder, an R package providing an interface between GNU R and the DieHarder battery of tests for random number generators developed by Robert G. Brown. See the the RDieHarder page for some introductory material and links to the talk at UseR! 2007.

Version 0.1.0 extends the functionality of the dieharder function quite substantially and catches up to a number of recent changes in DieHarder. In particular:

  • dieharder() generator selection changes along the same line as in the DieHarder release: ids 1 to 200 are reserved for GNU GSL geneators, ids 201 to 400 for Dieharder, 401 to 500 for GNU R, 501 to 600 are hardware-based and ids over 600 are for user-contributed generators
  • dieharder() now supports new arguments 'inputfile' (for file_input and file_input_raw) and 'ntuple' (for tests with variable bit length)
  • dieharder() now also supports the 'rgb', 'sts' and 'user' tests
  • dieharder() now returns multiple Kuiper KS p-values for those tests that generate multiple p-values
  • dieharderGenerators() now returns a data.frame with two columns 'names' and 'ids' and generators can be selected via either a a name (as e.g. 'mt19937') or a numeric id (e.g. 16)
  • dieharderTests() was added and also returns a data.frame with names and ids permitting a similar selection via test name or via test id.
  • Some misc. code organisation, a cleanup removing more files, updated vignette output files, and the actual test sources updated to DieHarder 2.8.1

This new version should show up at CRAN and its mirrors in due course, in the meantime sources are also RDieHarder page.

/computers/linux/debian/packages | permanent link

Mon, 01 Sep 2008

Easy multi-seat (two screens, keyboards, and mice off one computer) setup
This is 'back to school' season aroun here. So on Saturday we went and set up two desks for our two kids. And with that I finally converted 'their' computer to a working multi-seat setup. At first, I had fiddled with two distinct (ATI Radeon) graphics cards. Somehow I never got x11 to recognise both cards properly.

But two cards are not needed. As the machine is running a standard Kubuntu setup, I just followed this excellent three-part tutorial for Ubuntu multi-seat setup which describes the process using nothing but standard Ubuntu software. From setting up a 'big desktop' spanning two screens (which is easy enough using one card via the vga and dvi outputs), it is fairly straightforward to modify the gdm.conf setup to spawn two gdm greeter instances using the Xephyr nesting xserver.

So far, all is well. We'll see what possible shortcomings we will find. The GL extensions are not supported, so some eye-candy will be unavailable.

/computers/hardware | permanent link

Wed, 27 Aug 2008

littler 0.1.1 released
The new release 0.1.1 of r (pronounced littler) was just rolled up.

The only new feature is due to a suggestion by Paul Gilbert: r now reports the value of the optional status variable when calling q() at the end of a script:

$ r -e'q(status=42)'; echo $?
42
This can be very useful to signal exit codes and branching on those in other scripts or Makefile. We also applied a patch to manual page which adds some examples there (thanks, Seb!) and made some small changes to tests and examples.

As usual, our code in our svn archive, on my r page, and in the local directory here. A fresh package is in Debian's incoming queue, and Jeff's littler page at Vanderbilt should reflect the new release soon too.

/computers/linux/debian/packages | permanent link

Tue, 19 Aug 2008

UseR! 2008 talk
Besides the slides from the tutorial at UseR! 2008 that were mentioned here previously, I also gave a short talk on scripting with R in high-performance computing using our littler frontend to R.

The talk introduces and extends an example related to some of the material from the tutorial itself. The slides from the talk are a little rough as the talk was somewhat ad-hoc: As session chair, I was confronted with a fairly last-minute cancellation and a 15 minute hole, and thought this would make a good little talk. It does show a nice trick for using littler with Open MPI (via snow) under the powerful slurm resource manager and batch/queue engine.

/computers/R | permanent link

Tue, 12 Aug 2008

UseR! 2008 tutorial
Earlier today, I presented a 3 1/2 hour tutorial Introduction to high-performance R (here is a brief description of the talk) at the UseR! 2008 conference at the TU Dortmund.

In a nutshell, the tutorial covered how to measure / profile R performance for speed and memory use, how to accelerate R using vectorised expression and tools like Ra / jit, how to add compiled code to R using either the .C or .Call interface and using the inline and RCpp packages, how to use R code in parallel (explicitly using NWS, Rmpi or snow as well as implicitly using pnmath / OpenMP), and how to script / automate R using littler, Rscript or RPy.

The final version of the slides is now available via my presentations page, and the live cdrom with software support for all the software used is at Alioth.

Update: Corrected link to presentations page thanks to heads-up by Charles. Thanks!

/computers/R | permanent link

Sat, 09 Aug 2008

RQuantLib 0.2.9
As version 0.9.6 of QuantLib, which was released a couple of days ag, is now in Debian, I just uploaded an updated version of RQuantLib. Only minor API changes to src/curves.cpp were needed. This new version 0.2.9 is currently in the queue at R's master CRAN host and should hit the CRAN mirrors shortly; likewise the Debian package has been uploaded and should also propagate to Debian mirrors in due course. As usual, source are also available locally on my site. Lastly, RQuantLib is hosted on R-Forge and potential contributors are encouraged to register at R-Forge and to get in touch -- this is a great way to learn how to combine C++ and R.

/computers/linux/debian/packages | permanent link

Sat, 07 Jun 2008

Into the sunset
I finally got around to dropping four old computers off to recycling. Triton, a community college nearby, had a recycling event where students volunteered, and so I finally got around to dropping two generations of old computers off.

Old computers, I hear you ask, well how old? Real old. The older two were from an age where the bios didn't yet boot off cdroms -- circa 1995. We had bought those in Kingston just off the Queen's campus. These were respectively a pentium 90 and a pentium 100, which still have traces on the web as miles.econ.queensu.ca (e.g. in a number of Debian changelogs) and rosebud.sps.queensu.ca which was of course Lisa's office machine and for a while the only internet address showing SPS.

The next two were purchased around 1999 in Toronto on College St just north of U of T's main St George campus. Those, an AMD k6-2 300 and a Celeron overclocked to 450 MHz (woot :) lived happily in the basement of our Toronto home, forming the first lan I built. If I recall they were initially connected using a crossed ethernet cable and a second nic to the ISP. Oh boy.

At least those latter two still boot off Knoppix. And do they ever feel slow. To think now just how many Debian packages I must have built on at least three of these over the years... And each machine must have gotten at least five decent years of usage out of them. One of the second generation computers eventually morphed into the kids play computer but even retired from that a while ago.

In any event, it was good to have them recycled, and also good to have been able to do so without paying a fee as is increasingly common. So cheers to Triton. I may be back in a few years as there are still a few computers spread across the house.

/computers/hardware | permanent link

Fri, 06 Jun 2008

Wayne Shorter at the CSO
Just got home from the 'An Evening with Wayne Shorter' concert at the CSO, part of this year's tour apropos his 75th birthday. The man is a legend and one my favourite musicians for both his own Blue Note work from the 60s and of course his participation in the legendary Miles Davis Quintet of the same period.

Shorter (ts, as) was playing with his quartet of recent years: Danilo Perez (p), John Patitucci (b) and Brian Blade (dr). And playing they did. Shorter has such a soft lyrical tone, which accentuates both the rhythmic and harmonic quality of the side men. Very enjoyable concert, fairly 'modern' and free in style. And no standards or old material. Oddly enough, not one spoken word: neither greeting nor good byes or just an introduction of the band. Recommended.

/music/jazz/live | permanent link

Thu, 05 Jun 2008

Adventures with Comcast: Part ohbynowIhavelostcount in an ongoing series
Regular readers of this blog (ed: oxymoron alert) may recall tales of woe with our beloved (ha!) cable internet provider such as this; then there are of course minor tales like this or this or this or the other stories on on this page but I am probably forgetting others.

Anyway, yesterday's highlight was initiated with a mail, seemingly sent to all customers, informing me that

ACTION REQUIRED: Comcast has determined that your computer(s) have been used to send unsolicited email ("spam"), which is generally an indicator of a virus. For your own protection and that of other Comcast customers, we have taken steps to prevent further transmission of spam from your computer(s).
and the email went on to recommend some Windows anti-spam measures, including a reference to a page I could only open with IE at work and one URL to a page that doesn't exist. Nice. Not. Needless to say, there are now Windows computers sending mail (via Comcast) here (as the lone windows box, my wife's work laptop goes straight to her university webmail).

And obviously, they blocked port 25, so no more mail sending from home. So I grumpily logged a compaint having been on hold and in telephony menu hell for fifteen or twenty minutes. I was promised to hear back in 72 hours. Hasn't happened yet, naturally, but we're only half way through...

Anyway, to make a long story short and this post constructive: Here is what you do on a Debian or Ubuntu system running exim as your mail transport:

  • sudo editor /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp_smarthost and add a line port = submission in the remote_smtp_smarthost block (assumming you have the split configuration chosen for the exim4-config package). Setting port to 'submission' switches from plain old STMP to the authenticated version running on port 587; submission is mapped to 587 in /etc/services.
  • sudo editor /etc/exim4/passwd.client and add your user and password id as e.g. for comcast web-login
  • sudo update-exim4.conf to update the configuration
  • sudo /etc/init.d/exim4 restart to restart exim
And it may pay to check /var/log/exim4/mainlog for any irregularities. Barring those, you should now be sending mail to you smarthost using authenticated transfer over port 587.

In the meantime, it looks like they unblocked port 25 at some point today...

/computers/broadband | permanent link

Sat, 31 May 2008

Accelerated R in Debian
A few months ago, Stephen Milborrow started releasing a patched version of R that performs just-in-time compilation -- see his Ra page for some details and further pointers.

In a nutshell, Ra provides a modified R engine so that code preceded by all jit(1) function call, using his jit package from the CRAN archive, will run faster due to just-in-time compilation of loops and arithmetic expressions.

Ra offers to pick the low-hanging fruit for users as loops can be a bottleneck. Of course, as shown in Stephen's case study, using appropriate vectorised expression will often be faster still. That said, for a certain class of problems, Ra should offer a decent speed boost.

Debian users can now just say

    sudo apt-get install r-base-core-ra r-cran-jit
as the Ra and jit packages in Debian's unstable distribution (and in the case of jit, even in testing).

Lastly, version 1.1.0 of Ra was released by Stephen yesterday and is now also in Debian unstable.

/computers/linux/debian/packages | permanent link

Sun, 25 May 2008

Bike The Drive 2008
Memorial Day weekend, so time for the annual Bike The Drive in Chicago. Got the whole family up bright and early, and was it ever nice -- 60-some degrees, sunny blue skies and no wind. Perfect conditions. And the Chicagoist blog has some pictures up.

/sports/cycling | permanent link

smtm bug fix release 1.6.10
A new version of smtm just went to Debian and CPAN. Perl 5.10 required a small change in how we test whether certain arrays do, or do not, contain elements. No other changes were made.

/computers/linux/debian/packages | permanent link

Thu, 22 May 2008

JPM Chase Corporate Challenge 2008
Just got back a little earlier from running the 2008 edition of the JP Morgan Chase Corporate Challenge. And again a record crowd of now just over 23000 in Chicago -- announced to be bigger than those at the JP Morgan Chase races in Boston, San Francisco or New York! This year the weather wasn't quite as stunning as it has often been in the past. But at least, temparatures in the high 40s and an overcast sky make for good running conditions.

This time, two colleagues and I tried to make it close enough to the starting line to not waste too much time 'surfing' around slower runners who for whatever reason think they have to be up at the front. And that seems to have worked: despite a still crowded start, I ran even, steady and fast enough to beat the PR from 2005 by a decent margin with a (hand-stopped) time of 20 minutes and 46.65 seconds. That yields 5:5619 min/mile (or for Christian, 3:4132 min/km) which seems too fast given the splits I saw at miles two and three. Oh well -- same cours as as the other five times that I've run this, so I trust the course is USATF certified.

And as always, good to hang with folks from work for a cold one or two afterwards. Given the temperatures, I didn't last very long though.

/sports/running | permanent link

Sat, 10 May 2008

Quarryman Challenge 2008
This morning was the 2008 edition of Quarryman Challenge, a 5km and 10mile race in Lemont, which is southwest of Chicago along the Illinois-Michigan Canal.

Three of us ran the 10 mile race, which was nicely organised. But is it ever friggin' hilly there: the race course takes three turns from the lower levels near the canal up towards those hills. As the elevation chart (that I cut out of this pdf file with the course map) shows, it is not so much the total elevation but rather how steep the incline is.

Quarryman Challenge elevation profile

That said, I did okay: even though the legs were really tired throughout from those inclines, I finished in 1:12:08 for a pace of 7:13. And given the reasonably small field, that yielded 34th place overall and third in my age group.

/sports/running | permanent link

Fri, 09 May 2008

On modes of transportation
Something I never really mentioned was the purchase of the foldable bike: a Dahon with a reasonably lightweight aluminum frame, a seven-speed hub and high-pressure tires. It's great fun in the city for the rides to and from the commuter train, or across downtown for occasional errrands after work.

I have had this foldable bike for nearly two years, and used it almost (work-)daily, even in the Chicago winters. 'Almost' because I did suffer from broken parts on a few occassions: a pedal broke (easy replacement), the axis in the front wheel broke (a good week for a new and inexpensive wheel) but the bummer was that a part of the frame-folding mechanism broke last fall. Given that the bike, which I bought used via craigslist, is a few years old, the part was no longer standard and so we waited for it to be shipped from the manufacturer. And waited and waited some more until Dan's decided to give me a matching part from a bike in their inventory. But apart from that episode, and the occassional problem with conductors on the Metra commuter trains, it has been a smooth ride. Highly recommended, and I do see a few more foldable bikes downtown.

Trek and Dahon bikes
But what is new now is that I finally gave in and bought a road bike, once again off craigslist. My daily commute is about ten miles one-way, which works out to about 35 to 40 minutes of cycling, plus a few minutes of locking/un-locking, changing, etc. I had used my trusted (yet heavier) touring bike with its steel frame a number of times, but felt that a road bike may make for a faster ride. While it saves a few minutes, it is not really a time saver as the bike-train-bike commute also takes around 40 to 50 minutes. That said, riding is simply a nice way to clear the head before or after work. I am back on the schedule I tried for a few weeks last summer / fall: running on Tuesday and Thursday leaves Monday, Wednesday and Friday for the bike commute. So far, I am 8 for 9 over the last three weeks. On the downside: one rather wet ride home, and already to minor flats that (luckily) still allowed riding home. The hardest part is meeting up with some other riders at 6:00am meaning that I am now getting up at 5:00am whether I am running or not. But all told a nice way to get some exercise in outside of running.

/sports/cycling | permanent link

Sun, 04 May 2008

On soccer, promises and hair cuts
Both my daughters have been playing soccer for a while now. And for another little while, I had been promising that if they ever scored three goals in a game, I'd shave my head.

As the attentive reader may have guess by now, that day finally came. This weekend saw a suburban tournament in nearby Oak Brook, and lo and behold Anna scored three goals in the first game! So home we went, out came the tool and she rather professionally separated me from my hair. So today on day two of the new look, a friend took this picture of me (scaled down from 2.4mb to around 80kb) at the same tournament:

Dirk on 4 May 2008 with a new look

They actually played just about the best soccer I have seen them play, won their group (with three shutouts!) and lost a hard-fought and well-played final 2:4. And today the weather even cooperated as one can see from the photo. Nice weekend, all told. And yes, the head feels kinda nice ;-)

/misc | permanent link

Sat, 03 May 2008

getopt support for littler
Practically ever since Jeff and I released littler to add easy scripting for R, questions popped up about how to support getopt-alike command-line parsing.

And as of today, a new package r-cran-getopt is in Debian. It provides Allen Day's recently released package getopt from CRAN which provides a new function getopt.

Given a suitable data structure that provides long and short-form command-line option names, whether arguments are mandatory, optional or not required (as for flags), and a data-type, getopt munges the command-line arguments supplied by the user and fills a new variable opt accordingly. If a fifth columns with help text is provided, a usage string can be generated as well.

Thanks to Allen for writing getopt, for accepting a quick two-line patch extending support from Rscript to littler, and for fixing one or two minor bugs. Thanks also to the Debian ftpmasters for adding r-cran-getopt within a few days.

/computers/linux/debian/packages | permanent link

Fri, 25 Apr 2008

Google Summer of Code 2008 projects assigned
As mentioned a few weeks ago, I had submitted two entries as possible topics for this year's version of the Google Summer of Code. And lo and behold, the response was very, very good and both applications were slotted!

For the topic 'create a PostgreSQL package for R that uses the standard DBI interface', a number of interested students contacted me, and a total of three applications were submitted. And while the R Foundation was only able to allocate four topics among a number of really good applications, Sameer Prayaga was our pick for this topic. It would be nice to fill this gap among the existing database connection methods for R, and I feel that Sameer can pull this off.

For the second topic of 'create a cran2deb tool for converting CRAN sources into Debian package' which I had submitted within Debian, Charles Blundell wrote an excellent application. In a way, this topic is a '2.0' version of our previous attempts of a 'top-down' set of tools in the pkg-bioc project on Alioth. This time, we will try something smaller, maybe more modular and lighter and see how far we get there if we try it 'bottom-up'.

And as we are currently in the community bonding phase, say Hi to Sameer or Charles when you come across them these days.

Lastly, I'd like to thank everybody who submitted an entry at Debian or R, or who contacted me about one of the topics I posted. The respone was very humbling, many of you were imminently qualified and seemingly very motivated -- but even Google's pockets can only pay for a finite number of projects. Sorry if yours did not get picked.

/computers/misc | permanent link

Thu, 24 Apr 2008

smtm maintenance release 1.6.9.1
A new version of smtm went to Debian yesterday. And after some pondering, I also just uploaded it to CPAN even though it only contains minor Debian packaging fixes rather than genuine code changes. In order to signal that the core Perl code hasn't changed, I simply added a new minor to move the version from 1.6.9 to 1.6.9.1.

/computers/linux/debian/packages | permanent link

Thu, 17 Apr 2008

Updated marathon pace geekery
A few months ago, I had shown a chart with per-5k-segment paces of my prior marathons runs (or at least those for which organisers had 5k splits). Since then, I ran two more: New York in November and London last Sunday. I have updated the charts to show both of these:

(updated pace comparison chart)
A few things stand out:
  • New York (in orange with small squares) clearly was my steadiest run, with very little variation and that nice burst at the end where the pace dropped below 7min/mile for the finish;
  • London (in yellow with open circles) was allright, but with a noticeable drop around kilometers 30, 35 and 40 -- but then the second-best finish.
  • I really cannot pace myself at the start: The two most recent races had the fastest start, including the hot and humid Chicago 2007 marathon (in light blue with circles)

In case anybody is interested (C'est bien toi, Christian, non?), the R script is available and I will be taking questions by email as R may not be obvious at first if you haven't used it.

/sports/running | permanent link

Tue, 15 Apr 2008

London Marathon 2008
Sunday was the 100th anniversary of the modern marathon: It was during the London 2008 Olympics that the marathon was run for the first time for the 26 miles 385 yard, or 42.185 km, distance. And it so happens that I was there to run the 2008 edition of the London Marathon.

Pretty nice weather at the start and finish: sunny, not too hot, occassional clouds. But being London, we still managed to get drenched for about 30 minutes. Overall, the conditions were good -- or else Martin Lel would have had a hard time for a new course record.

The course is pretty nice, right from the start in Greenwhich all the way to the spectacular finish in Westminster. Crowd support was good too, if only a little uneven. But the second half of the race, and particularly the last miles in the back from Canary Whard through the City to Westminster past Parliament and Buckingham Palace were awesome. Lots of people, lots of noise.

Oh, and I even saw the leaders in a group of five, including Lel and Hall as the course had parallel 'out' and 'back' tracks around miles 13 for me and 22 for them.

I had planned to take it easy and not try to run too hard, and aimed for a time of around 3:25 and then finished at 3:24:41. A little less 'even' than I had hoped, but still a very satisfying result. And for once the legs aren't all that shot afterwards :)

And the whole weekend was nice as I got to stay with friends in the southwest of London. Staying up late on both ends of the trip suppressed the jet lag fairly well. At least that's what I keep telling myself.

/sports/running | permanent link

Mon, 17 Mar 2008

Google Summer of Code 2008 projects are up
For the past few years, Google has been running their Summer of Code events where Google offers $5000 for college students who are asked to code for a summer for the benefit of various Open Source efforts.

And just like in 2006 and 2007, I put up proposals and offered to act as a mentor. The first one is up at both Debian and R: an opportunity to help with the ongoing efforts of 'turning more CRAN package into Debian packages'. The second one is only at the R page: a proposal to fill the missing link of DBI database interface modules with a matching one for PostgreSQL. More details for either idea are at the respective pages. Anybody interested should ping me by email.

/computers/misc | permanent link

2008 March Madness Half Marathon in Cary
Yesterday was the annual March Madness Half Marathon in Cary, IL -- and this year marked the 30th anniversary of the race. The race is getting more and more popular as the start to the running season, and an early race for those in Spring marathon training. This year, and it had sold out in a matter of days.

As for the race, I wasn't running it all that well. My legs already felt heavy when I was doing a casual four-miler the day before with our local running group. Similarly, I didn't feel all that loose yesterday. By mile four or five I was getting into a decent rhythm, and I was then running fairly steady 7:15s until about mile 11 when I ran out of gas and had to slow down. Final time was 1:36:38.15 -- not only several minutes slower than last year's but also slower than two years ago.

/sports/running | permanent link

Sat, 15 Mar 2008

SFJAZZ Collective at CSO
Went to the CSO yesterday as a nice way to end a frantic workweek: first a beer or two after work, and then off for some Jazz.

Yesterday's program was the SFJAZZ Collective: eight individuals, all noted in their own right, coming together for a few weeks each year to play as an ensemble. The program generally consists of two halfes: one with material by a modern composer -- Wayne Shorter is this year's pick -- and new original compositions by the band members.

This was a special treat as Wayne Shorter's compositions from the 1960s, both from the bands he lead and as a member of the legendary Miles Davis Quintet, have always been some of my most favourite modern pieces. At the same time, it gave me a chance to finally see Joe Lovano on ts and Stefon Harris on vb. Other band members were equally impressive: Dave Douglas tp, Miguel Zenon as, Robin Eubanks tb, Renee Rosnes p, Matt Penman b, Eric Harland dr. Favourite new composition of the night: 'Angel's Shares' by Penman.

All in all a nice evening out to cap off a busy week.

/music/jazz/live | permanent link

Mon, 10 Mar 2008

PGApack 1.1: Almost as good as new
PGAPack is rather nice and fairly small library for 'parallel' optimisation via generic algorithms using the MPI message passing protocol. PGAPack 1.0 was written by David Levine while doing graduate work during the mid-1990s at Argonne Labs / University of Chicago.

PGAPack has also been in or around Debian for a rather long time, but it suffered from benign neglect in the last few years. Some of this came to the fore in this bugreport which lead to my offer to the then-maintainer Andreas to help on the relicencing request. After all, Argonne Labs is just a few miles from where I live, and I had already spent a little bit of time polishing and upgrading the package for my own exploratory use.

So I called Rusty Lust, head of the Mathematics and Computer Science section at Argonne to try to sort this out. He was sympathetic and put me in email contact with David Levine. As we are all somewhat busy, this dragged on for a little longer than we thought --- but as of today, about and a half years later, we have a new and shiny PGAPack 1.1 release, or around twelve years after the initial 1.0 version came out.

I have done a fair amount of polishing: there are now two library packages for serial use (i.e. for debugging) as well as parallel use via MPI. We use Open MPI where available and LAM where not. All open Debian bugs have been addressed. One minor issue in the postscript documentation remains as David can no longer locate his LaTeX sources; I may just have to extract the text and re-latex this from scratch to update it. One day.

Anyway, for full reference, the changelog entry is below. The package is currently in the NEW queue (as the new sub-package require manual inspection and approval) but should hit mirrors in a couple of days.

My thanks to the two previous Debian maintainers; to Rusty Lusk for helping with the from the end MSC department at Argonne Labs and for suggesting the rather liberal and easy MPICH2 license (and he happens to be one of the MPICH2 authors); and of course to David Levine for writing PGAPack in the first place, for agreeing to relicense it and giving valuable feedback on my repackaging of what is now version 1.1 on the MCS ftp server at Argonne --- this library has held up really well over the years; let's hope it will find more good use going forward.

pgapack (1.1-1) unstable; urgency=low

  * Really good news:  The MCS divsion of Argonne National Laboratories has
    agreed to relicense pgapack using the MPICH2 license. So pgapack
    is now Free Software and can move into Debian's main archive!
  
    Our thanks go to David Levine and Rusty Lusk to make this possible.
  
  * New maintainer, following Andreas' offer dated 2006-10-04 in #379388

  * debian/control: Change section to math		(Closes: #379388)
  
  * Added new brinary packages libpgapack-mpi1 and libpgapack-serial1
  * The MPI package is configured using Open MPI where available and LAM
    where not. 
  * debian/control: Changed Build-Depends: to use OpenMPI where available, 
    and LAM otherwise.
 
  * Finally acknowledges old NMUs 		(Closes: #379168,#359549)

  * source/integer.c: Apply patch for one-off error 	(Closes: #333381)

  * source/report.c:  Do not unconditionally print at generation 1
  
  * debian/rules: Remove a bashism 			(Closes: #379168)
  * debian/rules: Install examples directly 		(Closes: #134331)
  * debian/control: libpgapack-lam1 Depends on lam4 	(Closes: #60376)
  * debian/rules: Rewritten using debhelper
  * debian/control: Added Build-Depends: section for debhelper

  * No longer install mpi.h in /usr/include		(Closes: #404027)
    
  * debian/control: Updated Standards-Version: to current version

  * man/man1/PGAGetCharacterAllele.1: fix whatis entry 		(lintian)
  
 -- Dirk Eddelbuettel   Mon, 10 Mar 2008 18:03:34 -0500

/computers/linux/debian/packages | permanent link

Thu, 21 Feb 2008

Time flies ...
... when you're having fun. This blog turned five a few days ago.

/computers/www/blogging | permanent link

Sat, 16 Feb 2008

CRANberries updated
The good folks at CRAN, the R package network which by now contains over thirteen hundred packages, have reorganised the website slightly. As far as I can tell, the changes are all for the better: nicer URLs, lots more per-package information, and other changes such as an updated 'task view' format (which is the part I knew via my maintenance of the Finance task view).

But these changes also affected my my CRANberries (see the html or better yet rss view) summaries of new packages as some of the source information moved. So I just updated the (surprisingly short at 189 lines including plenty of whitespace and comments) script, and things should work now come the next update.

While updating the 'more info' link for new and updates posts to point to the new-style entry at CRAN, I also took the opportunity to update the format of the `blog' entry for updates where we now show title and description along with the diffstat output,

I also manually copied in two of the recent entries: the new package emu where CRANberries had fallen over as we could not find the package description (in the new spot), and the existing package GEOmap where diffstat failed as we somehow didn't have a proper tarnall of the previous sources.

/computers/R | permanent link

Fri, 15 Feb 2008

Nice one
A post entitled Saved by SPAM just appeared on the Journal's website. Yes, it's only barely ironic, but heck, my blog is geeky enough for a lame joke about spam and it is Friday afternoon before a long weekend ...

/otherblogs | permanent link

Mon, 14 Jan 2008

littler 0.1.0 released
I've just rolled up release 0.1.0 of r (pronounced littler). A few changes made it into this release:

  • A new option -l (with long form --packages) to load packages into the R session. This is useful when you have data-processing one-liners as it does away with the explicit library(foo) inside your actual R expression. For added bonus, library() is wrapped in a suppressMessage() call.
  • The --no-restore option is now passed to the embedded R session.
  • Argument handling was improved / corrected for a corner case.
  • Output shown by --help was improved.
  • The manual page and README were updated accordingly.
  • The datasets package will now be autoloaded.
  • Two minor fixes went into bootstrap, the wrapper around autotools and friends.
  • The cache-clearing part of the R package updater example update.r was improved.
  • We added some comparisons to Rscript in the timing tests.

As usual, our code in our svn archive, on my r page, and in the local directory here. A fresh package is in Debian's incoming queue, and Jeff's littler page at Vanderbilt should reflect the new release soon too.

/computers/linux/debian/packages | permanent link

Sun, 06 Jan 2008

NAS'ed
I had been eyeing an inexpensive network-storage device -- for the non-geeks: think of a hard disk with an ethernet port and some controlling software -- for some time. I was aware of some of the hacking efforts around a few of these but somehow nothing really appealed. I was sort-of looking for nslu2-with-a-disk, and preferably not too expensive.

Lo and behold, that's what I saw today in my techbargains feed: a Buffalo LinkStation Live which contains a 500gb SATA for $199 after rebates. Some quick googling lead to these wiki pages which looked promising: anything from enhancing the stock Linux setup by enabling a few more services to a custom Linux distro (similar to my wrt54 router running linux) to reportedly some work-in-progress for a native Debian installation. Nice!

So off I went, ordered the thingie for local pickup for an additional 5% off and picked it up a little later at the local Circuit City (where visits are seemingly a recurring event these days). The documentation is very brief and insist that you install something on Windows -- just to find that the little box autoconfigures itself just fine. Presumably some network discovery is going to find the assigned dhcp address which is needed for the web interface. A few minutes later, a new (fixed) IP address was assigned, ntp was enabled and that was about that.

After dinner, I quickly followed this tutorial to get the box a bit more Unixy without going too far (yet) from the default: start up telnet via a simple Java command-line tool, login, then enable ssh, set it up in /etc/init.d, add some extra binaries. All very quick and simple [ with the caveat that the addons.tar didn't want to get there via the Java tool, so manual scp once 'inside' did the trick ].

NFS, which I like for shuffling files around, appears to be little trickier for this ARM-based LinkStation Live. So at least for now I am content with simple rsync'ing of my backup directories on the few machines here. Much better than the current setup with mutual backups between workstations and semi-permanently being out of space.

All in all a rather pleasant gadget and recommended at the price. The extra $100 in rebates are valid from today to the 12th.

/computers/hardware | permanent link

Sat, 05 Jan 2008

RQuantLib 0.2.8
Version 0.9.0 of QuantLib was released a couple of days ago on Christmas Eve; as usual, Debian packages for QuantLib were uploaded right away following a few earlier pre-releases.

As QuantLib is approaching its 1.0 release, a few API changes requires updates to basically all of RQuantLib's C++ source files. Luckily most changes were minor. At the same time, we also generalised the Binary (aks Digital) option pricer to allow for a 'binType' argument (with values 'cash', 'asset' or 'gap' for CashOrNothing, AssetOrNothing or Gap digitals) as well as an 'excType' argument to switch between European and American exercise. Dominick made a small change to the DiscountCurve object to seamlessly pass a switch variable indicating whether we have 'flat' curves or not.

Another change was the addition of formal unit tests using the RUnit from CRAN (which we happened to have added to Debian recently in the wave of new RMetrics packages). We use the scheme initially proposed by Gregor Gojanc and extended by Martin Maechler for RMetrics that allows the unit tests to be run from both the source and the installed package which is nice. As QuantLib itself has a massive amount of unit tests in its code; I am hoping to add more and more of those into RQuantLib itself as we add more functionality.

On that front, more exciting news: RQuantLib is now hosted on R-Forge. Potential contributors are encouraged to register at R-Forge and to get in touch -- this is a great way to learn about combining C++ and R.

To wrap up, the new version 0.2.8 is currently in the queue at R's master CRAN host and should hit the CRAN mirrors shortly; likewise the Debian package has been uploaded and should also propagate to Debian mirrors in due course. As usual, source are also available locally on my site.

/computers/linux/debian/packages | permanent link

Thu, 27 Dec 2007

Internet NON-Service Provider: Yet another Comcast saga
About a week ago, our internet connection appears to have dropped around midnight. Not a biggie, one thinks. So on Friday morning between shower, breakfast, and leaving for work, I power-cycled the cable modem and router a few times which 'usually' helps.

Not this time. Still no signal by the afternoon, and when Lisa called the help line, they confirmed that they could not see our cable modem. That could have given it away, but I didn't click.

This being this time of year, we were actually out overnight on Friday so that I couldn't get to inspect matters at home. Also, friends and neighbours were out the next day so I couldn't get my hands on another cable modem to see if it was the line (my suspicion at the time) or the modem. All I could do was call, go once more over all possibilities with the tier-1 help person -- and schedule a technician to swing by on Monday afternoon, i.e. on Christmas Eve, or about 48 hours later (!!). So I made do over the weekend with two trips to the local library to consume some of their wireless signal to catch up on things.

The big surprise came on Monday. The technician, was on time and rather friendly and knowledgeable, checked the signal strength at the box outside, and on two cable outlets in the house. All fairly well. So during the second call to Comcast, we turned our attention to the cable modem. A few years ago I returned the 'leased' modem and bought an inexpensive 3com cable modem. Only after checking that it was supported, of course.

Well now it seems that Comcast decided that this (old) modem can only talk Docsys 1.0. And instead of telling me in advance, they just fscking dropped it cold. Unbefriggable. I must be getting two fliers a month informing me how great Comcast's so-called (and IMHO rather overpriced) 'Triple Play' is. You'd think that they use that mail-out infrastructure to let me know about the service change. Or use email, after all they are my ISP. Naaah. Rather just drop the service cold right before the holidays. That's the spirit.

To clarify and repeat, I do not mind service updates. I do not mind improving standards and improved throughput. And as I am quite happy to buy a new modem on the spot on Tuesday afternoon -- yes, Christmas eve, because I then have nothing better to do than to troll in the mall to buy a new Motorola cable modem at full retail cost rather than somewhat more cheaply at Amazon or other places -- I could easily have done better if only they had told me in advance. I could go and use some choice terms , but as we're still in the holiday season I better stop... Maybe I should just go back to DSL and save a few bucks.

/computers/broadband | permanent link

Tue, 20 Nov 2007

Several new Rmetrics packages
Debian has provided Rmetrics packages for financial engineering and computational finance since the Rmetrics release for R 1.9.0 in the summer of 2004. Over the years, Rmetrics has gotten more granular and changed from a handful of packages to two handfuls --- and the most recent release extended this trend even further to almost two dozen packages as shown in this chart.

Dependency chart for Rmetrics packages

Rmetrics now comprosises over twenty individual packages. Eleven new packages were added in the 260.72 release for R 2.6.0, and they required eight other new packages from CRAN. While I would have preferred a more spread-out approach than the shotgun approach of having to introduce all these new packages at once (which took the last four weeks), I am in support of the reorganisation which should make maintenance more easy going forward.

So to get all of these packages onto a Debian box, a quick sudo apt-get install r-cran-rmetrics is all it takes. Currently supported only in the always-fresh unstable flavour, but hopefully soon in testing too.

A big Thank You goes to the Debian FTPmasters. Of the 20-some packages that I added to Debian during this Rmetrics expansion, many were added within a day or two.

Lastly, thanks also to Florian Hahne, Robert Gentleman and Elijah Wright for much appreciated help with R's Rgraphviz and graph packages to create the chart above. It only takes a handful of lines to create the basic graph, and another few lines for the colours and titles. The code is available on request, of course, but you need the current development versions of the BioConductor packages Rgraphviz and graph (which are not in Debian yet).

/computers/linux/debian/packages | permanent link

Tue, 06 Nov 2007

New York Marathon 2007
I was in New York last weekend to run the 37th New York Marathon which reportedly currently stands as the largest marathon ever run in terms of starters and finishers. And it couldn't have happened on a nicer day: sunny at the start with temperatures in the 50s, no wind, and some clouds in the second half prevented it from getting too hot.

Large crowds at most parts of the course, a decent number of bands, and a generally very excited atmosphere. And of course a nice course across the five buroughs finishing in Central Park.

For once, I managed to run the race steadily and yet fairly fast, ending up with a time of 3:18:47 (and thus a 7:35 pace). This is about a minute slower than my PR from Sunburst 2006, and just two seconds faster than my best Chicago Marathon result from 2006, yet much better than this year's times from Boston 2007 where it was too cold, and Chicago 2007 where it was way too hot. Given that the NY course is somewhat hillier, and that was definitely busier and more crowded than the other races, I'm quite happy with the time, and the way I ran, getting through without any walking breaks. Not quite negative splits at around 1:38 and 1:40 for the two halfes. With enough energy left at the end, I finished the remaining 2+ km after the 40 km with a sub-7:00 min/mile pace which felt great. And it is a nice feeling to have completed the Boston, Chicago and now New York marathons in the same year.

Last but not least, the weekend was a general blast as I was staying with a friend I hadn't seen in a decade which made for lots of stories, and even more beers...

/sports/running | permanent link

Sun, 21 Oct 2007

Frank Lloyd Wright Race 2007
Today was the 31th annual Frank Lloyd Wright Race in Oak Park, next door to where I live. This is a really nice small race I've run in 2006, 2004 and 2003.

Conditions today were almost ideal -- we're having unseasonally warm weather, and it was sunny and in the high 60s without any noticeable winds around the start time making it feel like a nice late summer morning, rather than the middle of fall. I wasn't aiming for something really fast as I am still in training for a longer race and had done some reasonably fast one-mile laps yesterday morning leaving me not exactly well rested. But for once, I actually managed to run a steady pace, and then ended up within seconds of last year's PR for the distance at 41:15 for a pace of around 6:39, so I'm quite pleased.

/sports/running | permanent link

Mon, 15 Oct 2007

Some marathon data analysis
A few days after the Chicago Marathon in record-heat, I was wondering just how much more I had been slowing down, and exactly when during the course of the race this occurred.

Bigger races like Chicago provide the so-called 'splits' for every five kilometer segment. Helpfully, they also keep these with the archived results which are all accessible via the web. So it was easy enough to collect a sequence of time such as '0:22:34', '0:45:30', ... and so on. And thanks to R, my data slicing and dicing and visualizing tool of choice, it was just a handful of lines to produce the chart below.

The chart covers my four Chicago marathons (2004, 2005, 2006 and 2007) as well as Boston (2007) --- the infamous '27.2 mile marathon' seems to have dropped off the net, and the smaller Sunburst does not have 5k split times.

(pace comparison chart)
The chart shows a few things quite nicely:
  • The first marathon (dark blue squares) shows a marked slowdown after km 25.
  • The following year (medium blue circles) a similar yet more moderate slowdown appeared 5 km later.
  • The next year (blue triangles) the pace was remarkably stable without noticeable slowdowsn -- yielding the 'BQ' time.
  • The Boston race (smaller green circles) was more prudent due to weather conditions, and a lingering cold, and I managed to finish much stronger than I ran the middle part.
  • Chicago this year (orange circles) was a different story: too fast at the beginning, a cautious middle similar to the Boston race but two fairly significant slowdowns after 25 and 35 km, followed by some acceleration right before the finish.

While it was a tough race, I clearly ran a lot slower than previously, in particular between 25 km and 40 km. So there.

/sports/running | permanent link

Sun, 14 Oct 2007

Tunes
Following Adam's suggestion about Pandora, I spent a the rest of the afternoon, evening and this morning listening to music old and new. Pandora's suggestions around Madeleine Peyroux were particularly nice as they lead for example to Sam Philipps which was new to me. I can already see this leading to increased spending on new cds... Recommended.

/otherblogs | permanent link

Sat, 13 Oct 2007

Makin' Tracks 2007
This morning was the annual Makin' Tracks 5k. As this starts three blocks from my house, supports the track at Concordia, a small college up the street from me, and runs right through my immediate neighbourhood, it's hard not to participate. This year was of course not even a week after the somewhat eventful Chicago Marathon in record-heat so I wasn't sure how well I'd do. Once again, I started way too fast and ran two miles at a 6:00 min/mile pace that I couldn't sustain just a week after a (arguably slow but still full distance) marathon, so I slowed down and finished at 19:41.7 or so for a pace of 6:23, or about eight seconds slower than my best time from last year (when we ran before the marathon). Onto the Frank Lloyd Wright 5k/10k next week, then!

/sports/running | permanent link

Tue, 09 Oct 2007

Chicago Marathon 2007
Last Sunday, the 30th Chicago Marathon took place. This was meant to be the year to make this marathon even bigger: the field had increased substantially to 45,000 registrations yet it still sold out way earlier than in previous years.

As it turns out, the weather did have its own surprises in store. Earlier in the preceding week, the forecast changed from overcast and rainy to sunny. And sunny it was. While we had a bit of cloud cover at the start at 8:00am, temperatures were already in the 70s and keep increasing. This year's marathon is now on the books as the hottest ever: the clouds dissipated and it was a scorcher.

Needless to say, it was a rather challenging race. I finished in 3:41:39, which is my slowest time by some margin for the by now seven marathons I've ran. It was not the day for running fast. Of course, I didn't quite grok that at the start and ran ten miles reasonably hard in a quick pace, but then paid for it, and then paid some more. That said, apparently around 10,000 registered entrants didn't even start, and another 10,000 did not make it to the finish. With the heat, several hundred were treated by the medical teams. Worse still, one 35-year old runner from Michigan died (though the autopsy claims he had a heart condition; other reports say that alone cannot have been lethal). The race itself was aborted, and those who had not reached the half-way point by 12:00am were diverted to the finish and urged to walk rather than run.

According to the (by now fairly extensive) news coverage, this whole experience left quite a few people mad and bewildered. As of today, a good 48 hours after the race, the City seems to be in some sort of crisis management mode to prevent damage for the oh-so-important bid for the 2016 Olympics.

For some flavour of the news coverage, see an early report, some stunning pictures and some suggestions to prevent another one like this, all from the Chicagoist blog.

/sports/running | permanent link

Wed, 03 Oct 2007

Beancounter minor bug fix release 0.8.8
For some odd reason, the trusted Perl module Date::Manip now wants the "approx" argument to format date differences. Adding this in the five calls across beancounter and BeanCounter.pm is the main change in version 0.8.8 which is now in Debian's incoming area, uploaded to CPAN and onto the beancounter page here.

/computers/linux/debian/packages | permanent link

Wed, 26 Sep 2007

Dear Samsung,
Your ML-3051N is a lovely little 'network printer. Hopefully, it will serve us well in the next few years as a replacement for the thirteen year old HP Laserjet ML we bought (for about five times the price: a native postscript printer was rather expensive in 1994) a long time ago. And you even mention that it can talk to Linux! But still, I have a few things to talk to you about:

  1. as you mention Linux, it's a tad odd that you enumerate various flavours (RH 8 to 9; FC 1 to 3; Mandrake 9 to 10.2; SuSE 8.2 to 9.2) but happen to exclude the two variants that run around here: Debian and its cousin Ubuntu. Besides, the versions of those other flavours are a tad old, no?
  2. as you ship Linux software on a cdrom, it is even odder that the install shell script fails (note: spaces between #! and /bin/sh may not be a good idea). Fails not once, but twice: you say /bin/sh, but you meant /bin/bash as you use features of the latter. Oh well, I guess we all made that mistake in our youth. Irregardless, the software never installed. Which is a pity as this would be about the first 'household appliance' I bought with native Linux support. So close, and yet so far.
  3. the control panel is a cumbersome way to set a static IP and to turn the odd protocol off.
  4. did I really have to install all this windoze stuff on my wife's laptop just to learn that the printer does in fact have a nice web interface? You could have mentioned that with Linux info, no? It is in fact a rather nice and complete web interface.
  5. as you supply networking support, do you really have to enable every possible protocol under the sun? I mean slp, snmp, multicast dns, dynamic dns, raw tcp, ipp, ethertalk, netware ...? I still see my cable modem go down 'seemingly randomly' rigth after I print even though I now disabled just about everything but raw tcp to port 9100 as well as ipp to 613. Still, dropping the cable connection just because the printer wakes up is odd, isn't it?
  6. Cups, foomatic and all the other printing goodies do not yet know the ML-3050 family, so we are making do with the ML-2151NPS settings for postscript. Works fine, so I won't bother copying the ppd file from the cdrom to the handful of computers around here.
Kind regards, Dirk

Updated to fix a markup error. And the shebang space is standard, I am told.

Postscriptum: Turns out it was 'just' the power surge. Putting the cable modem and the router onto a different wall outlet, and a surge-protector and battery backup ups to boot, fixed the issue.

/computers/hardware | permanent link

Mon, 10 Sep 2007

Overdue smtm bug fix releases 1.6.9
Some time ago, Yahoo! Finance made some changes at their backend for the provisioning of png images which are also displayed by smtm. Version 1.6.9, which just went to Debian's master site and will go to CPAN in a moment, finally updates the code for that. My apologies for taking so long, and thanks to Michael Kurlin, Clinton R Lambe, Ian Seow and possibly others whose mail I haven't kept who sent me patches and heads-ups.

/computers/linux/debian/packages | permanent link

Another option for MythTV schedules
Yesterday I mentioned how straightforward it was to setup an account with SchedulesDirect to obtain tv schedule data for the MythTV 'software pvr'.

For completeness, a 'free as in beer' screen-scraping alternative is provided by the zap2xml Perl script together with a (free) registration at the tvlistings.zap2it.com site. Instructions are on the zap2xml page, and all is left to do is to combine the calls to zap2xml.pl and mythfilldatabase in a single shell script -- which is easy enough to do too.

/computers/linux/mythtv | permanent link

Sun, 09 Sep 2007

MythTV data service switch
One of several things I never got around to blog about is my experience with MythTV (which, in case you don't know, turns your pc into a tivo, given a suitable tv decoder card and the mythtv software). To make up for the lack of a more detailed post, let's just recap: I had set it up a little over two years ago, on what was then a 'possibly temporary' Kubuntu box. Well the box is still there and still running Kubuntu, four releases later, and MythTV only got easier as Ubuntu now includes the decoder drivers. So far no worries.

TV schedule information had always been provided via the 'free as in beer if you answer a question or two' zap2it subsidiary of the for-profit Tribune publishing company (i.e. the folks who own the Chicago Tribune, Cubs baseball team, WGN TV station around here and what have you, and are in the process of being sold to Sam Zeil). As had been noted in other places, that 'free' service is no more, and a seemingly cooperatively-run 'small fee' service has been set up by a few volunteers at schedulesdirect.

So I just switched, and for the benefit of anybody sitting on the fence about this, the switch is trivial:

  1. If need be, add 'feisty-proposed' to apt's sources.list file to get the new MythTV 0.20.2 release directly from Ubuntu.
  2. Install MythTV 0.20.2 which takes as little as wajig update; wajig dist-upgrade
  3. Follow the instructions in this email, in particular point 4) about re-creating lineup info
  4. sit back and enjoy
Including the signup at schedulesdirect, the whole thing took maybe ten minutes. Not bad. Now I only have to find some spare time so that the backlog of unwatched episodes of the Daily Show does not grow much beyond six months... Oh, and last but not least, I'll have to pony up the fifteen bucks for the subscription once my trial is up next Sunday.

/computers/linux/mythtv | permanent link

Chicago Half Marathon 2007
This morning was this year's Chicago Half Marathon, the race I often refer to as 'my favourite' after having participated in 2006, 2005, 2004 and 2003. But this year I somehow managed to convince myself I had already registered, when I had not. Doh. But due to a friend's unlucky injury, I could re-use his registration and still race. Weather was once again gorgeous, but maybe a little on the hot side. I tried not to push myself too hard given the two big races coming up in the next seven weeks and ran a decent 7:12 min/mile pace for a chip time of 1:34:20.

/sports/running | permanent link

Sun, 02 Sep 2007

Another Herbie Hancock concert
Herbie Hancock was in town to open this year's Chicago Jazz Festival with a concert at the CSO last Thursday. This time he came with a mostly-electric setup featuring Nathan East on bass, Vinnie Colaiuta on drums and Lionel Loueke on guitar. I found it at times a little heavy on the synthesizers, but it was evident that Hancock and the band enjoyed themselves during the two hour set. A nice concert, all in all, and again very different from his last two concerts.

/music/jazz/live | permanent link

Sat, 11 Aug 2007

The amazing Prof. Ripley (cont'ed)
A little mini-meme got started on August 1 when Ben Bolker posted the following code to the r-devel list (and here I substituted the more standard '<-' assignment operator for the less standard though-now permitted '='):

x <- readLines("http://developer.r-project.org/R.svnlog.2007")
rx <- x[grep("^r",x)]
who <- gsub(" ","",sapply(strsplit(rx,"\\|"),"[",2))
twho <- table(who)
twho["ripley"]/sum(twho)
In five lines (that could be shortened to three at the expense of some readibility), the SVN log for R is downloaded directly from the website, the revision authors are extraced and then tabulated by submitter. The relative percentage of Brian Ripley is found to be a staggering 74.8% -- or about three times as much as the other fifteen committers combined. Smokes.

[ Oh, and for those who don't know him, he's also got a day job which presumably entails looking after his graduate students at Oxford. Who knows, he may even teach. Kidding aside, he's actually one of the nicest persons you'll ever meet in real life. ]

Now yesterday, Simon Jackman who had at first simply repeated Ben's analysis on his own blog followed up with a nice analysis (albeit typeset in a way that rendered the code inoperational, which has now been fixes) that creates both a histogram and a dotplot of commits per hour of the day. Omitting Ben's code which Simon reuses, we have the following for histogram and dotchart:

tod <- unlist(sapply(rx,function(x)strsplit(x,split=" ")[[1]][6]))
tod <- tod[who=="ripley"]

tz <- sub(pattern=".*(-[0-9]{4}).*",replacement="\\1",x=rx)
tz <- tz[who=="ripley"]
tz <- as.numeric(tz)/100
offset <- 3600*tz

z <- strptime(tod,format="%H:%M:%S")
hist(z,"hours",main="Ripley Commit Times in SVN TZ")

h <- z - offset
h <- format(h,format="%H")
h <- factor(as.numeric(h), levels=0:23)
dotchart(table(h), main="Ripley Commit Times, By Hour in GMT",
         labels=paste(0:23,1:24,sep=":"))
This extracts the commit times, subsets to the ones by Prof. Ripley, extracts the timezones component (as strptime seemingly doesn't do that which is a pain), extracts the tz-less time via strptime into a variable 'z' for which the histogram is drawn. He then corrects the times by the tz offset expressed in seconds, formats is as hour of the day and turns it into a 'factor' (an R data type for qualitative variables which may be ordered as is the case here) and draws a dotplot. This results in the following chart:

Simon Jackman's per-hour charts of Brian Ripley's commit patterns

Now, nobody has looked at the time series. So we correct this and add the following:

## rather extract both  date and time
dat <- unlist(sapply(rx, function(x) {
  txt <- strsplit(x,split=" ")[[1]]
  paste(txt[5], txt[6])
}))
## subset on Prof Ripley
dat <- dat[who == "ripley"]
## and convert to POSIXct, correcting by tz as well
datpt <- as.POSIXct(strptime(dat,format="%Y-%m-%d %H:%M:%S")) - offset

## turn into zoo -- we use a constant series of ones as each
## committ is taken as a timestamped event
datzoo <- zoo(1, order.by=datpt)
## and use zoo to aggregate into commits per date
daily <- aggregate(datzoo, as.Date(index(datzoo)), sum)

## now plot as grey bars
plot(daily, col='darkgrey', type='h', lwd=2,
     ylab="Nb of SVN commits, three-week median",
     xlab="R release dates 2.5.0 and 2.5.1 shown in orange",
     main="The amazing Prof. Ripley")
## mark the two R releases of 2007
abline(v=c(as.Date("2007-04-24"),as.Date("2007-06-28")),col='orange',lwd=1.5)
## and do a quick centered rolling median
lines(rollmedian(daily, 21, align="center"), lwd=3)
This extracts both date and time, creates a proper R time object (a so-called POSIXct type) from it, fills a zoo ('the' magic class for time series) object with it, uses zoo to aggregate commits per day and plots those in a barchart-alike (I know, I know, ...) plot to which we add the two releases as well as a rolling and centered three-week median (as a real quick hack rather than a proper smooth).

Timeseries of Brian Ripley's commit patterns

This shows that Prof Ripley averaged about ten commits a day before and after the release of R 2.5.0, and that he has slowed down ever so slightly since then to end up at around a mere seven commits a day. Every day. For the seven-plus months we looked at.

So, anyone for analysing his r-help posting frequencies ?

/computers/R | permanent link

UseR! 2007: Two talks and a new R package 'RDieHarder'
The first UseR! conference in North America ended yesterday. I gave two talks and updated my presentations page accordingly.

One talk was joint work with Steffen Moeller (who had also presented our work in Italy in June, and I added that presentation too), David Vernazobres and Albrecht Gebhard and concerns automated building of around two thousand (!!) new Debian source packages for all CRAN and BioConductor packages for GNU R. I plan to send something to debian-devel on that in a day or two as well because the time is right for some feedback on this.

The other talk was on about RDieHarder. This is joint work with Robert G. Brown and uses his DieHarder library for random number testing (that I've added to Debian a few months back). It allows R to both runs these tests, and to further analyse and visualize the test results. I finally uploaded RDieHarder to CRAN a few days ago -- in fact, my CRANberries rss feed of new CRAN packages had it show up the morning of the presentation. And now that I've added a webpage about RDieHarder I can finally say it's been released.

/misc | permanent link

Sat, 21 Jul 2007

Dead disks, and lvm woes
As posted earlier on the rather recently announced CRANberries feed (that is generated and hosted on my box) covering CRAN package updates, my main server bit the dust. The hard disk no longer wants to talk to any of its lvm partitions, leaving me without /usr, /var/, /var/local/, /home, /srv. Fortunately, most of this was backed-up but now I have to go through reconfiguring the replacement machine I already set up. Not fun.

If anybody has tips on recovering the lvm partitions, I'm all ears. /etc/lvm/ seems fine if that helps as a starting point.

/computers/misc | permanent link

Mon, 09 Jul 2007

Announcing CRANberries
Earlier today I sent an announcement to the r-packages list. It describes CRANberries, two simple RSS feeds that summarize both 'new' and 'updated' packages at CRAN, the archive network for R. I cooked this up rather quickly using a few lines of R, a small SQLite db backend and the old Blosxom blog engine. A tip of the hat to Barry Rowlingson who almost immediately suggested to use the lol format instead.

The hope is that this proves helpful for keeping tabs on the amazing growth of CRAN (which is now at over one thousand packages) as well as the number of updates to existing packages. The feed(s) can be consumed standalone, or via the brand new Planet R aggregator that Elijah announced today too.

/computers/R | permanent link