Dirk Eddelbuettel Thinking inside the box
 
Sat, 07 Jun 2008

Into the sunset
I finally got around to dropping four old computers off to recycling. Triton, a community college nearby, had a recycling event where students volunteered, and so I finally got around to dropping two generations of old computers off.

Old computers, I hear you ask, well how old? Real old. The older two were from an age where the bios didn't yet boot off cdroms -- circa 1995. We had bought those in Kingston just off the Queen's campus. These were respectively a pentium 90 and a pentium 100, which still have traces on the web as miles.econ.queensu.ca (e.g. in a number of Debian changelogs) and rosebud.sps.queensu.ca which was of course Lisa's office machine and for a while the only internet address showing SPS.

The next two were purchased around 1999 in Toronto on College St just north of U of T's main St George campus. Those, an AMD k6-2 300 and a Celeron overclocked to 450 MHz (woot :) lived happily in the basement of our Toronto home, forming the first lan I built. If I recall they were initially connected using a crossed ethernet cable and a second nic to the ISP. Oh boy.

At least those latter two still boot off Knoppix. And do they ever feel slow. To think now just how many Debian packages I must have built on at least three of these over the years... And each machine must have gotten at least five decent years of usage out of them. One of the second generation computers eventually morphed into the kids play computer but even retired from that a while ago.

In any event, it was good to have them recycled, and also good to have been able to do so without paying a fee as is increasingly common. So cheers to Triton. I may be back in a few years as there are still a few computers spread across the house.

/computers/hardware | permanent link

Fri, 06 Jun 2008

Wayne Shorter at the CSO
Just got home from the 'An Evening with Wayne Shorter' concert at the CSO, part of this year's tour apropos his 75th birthday. The man is a legend and one my favourite musicians for both his own Blue Note work from the 60s and of course his participation in the legendary Miles Davis Quintet of the same period.

Shorter (ts, as) was playing with his quartet of recent years: Danilo Perez (p), John Patitucci (b) and Brian Blade (dr). And playing they did. Shorter has such a soft lyrical tone, which accentuates both the rhythmic and harmonic quality of the side men. Very enjoyable concert, fairly 'modern' and free in style. And no standards or old material. Oddly enough, not one spoken word: neither greeting nor good byes or just an introduction of the band. Recommended.

/music/jazz/live | permanent link

Thu, 05 Jun 2008

Adventures with Comcast: Part ohbynowIhavelostcount in an ongoing series
Regular readers of this blog (ed: oxymoron alert) may recall tales of woe with our beloved (ha!) cable internet provider such as this; then there are of course minor tales like this or this or this or the other stories on on this page but I am probably forgetting others.

Anyway, yesterday's highlight was initiated with a mail, seemingly sent to all customers, informing me that

ACTION REQUIRED: Comcast has determined that your computer(s) have been used to send unsolicited email ("spam"), which is generally an indicator of a virus. For your own protection and that of other Comcast customers, we have taken steps to prevent further transmission of spam from your computer(s).
and the email went on to recommend some Windows anti-spam measures, including a reference to a page I could only open with IE at work and one URL to a page that doesn't exist. Nice. Not. Needless to say, there are now Windows computers sending mail (via Comcast) here (as the lone windows box, my wife's work laptop goes straight to her university webmail).

And obviously, they blocked port 25, so no more mail sending from home. So I grumpily logged a compaint having been on hold and in telephony menu hell for fifteen or twenty minutes. I was promised to hear back in 72 hours. Hasn't happened yet, naturally, but we're only half way through...

Anyway, to make a long story short and this post constructive: Here is what you do on a Debian or Ubuntu system running exim as your mail transport:

  • sudo editor /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp_smarthost and add a line port = submission in the remote_smtp_smarthost block (assumming you have the split configuration chosen for the exim4-config package). Setting port to 'submission' switches from plain old STMP to the authenticated version running on port 587; submission is mapped to 587 in /etc/services.
  • sudo editor /etc/exim4/passwd.client and add your user and password id as e.g. for comcast web-login
  • sudo update-exim4.conf to update the configuration
  • sudo /etc/init.d/exim4 restart to restart exim
And it may pay to check /var/log/exim4/mainlog for any irregularities. Barring those, you should now be sending mail to you smarthost using authenticated transfer over port 587.

In the meantime, it looks like they unblocked port 25 at some point today...

/computers/broadband | permanent link

Sat, 31 May 2008

Accelerated R in Debian
A few months ago, Stephen Milborrow started releasing a patched version of R that performs just-in-time compilation -- see his Ra page for some details and further pointers.

In a nutshell, Ra provides a modified R engine so that code preceded by all jit(1) function call, using his jit package from the CRAN archive, will run faster due to just-in-time compilation of loops and arithmetic expressions.

Ra offers to pick the low-hanging fruit for users as loops can be a bottleneck. Of course, as shown in Stephen's case study, using appropriate vectorised expression will often be faster still. That said, for a certain class of problems, Ra should offer a decent speed boost.

Debian users can now just say

    sudo apt-get install r-base-core-ra r-cran-jit
as the Ra and jit packages in Debian's unstable distribution (and in the case of jit, even in testing).

Lastly, version 1.1.0 of Ra was released by Stephen yesterday and is now also in Debian unstable.

/computers/linux/debian/packages | permanent link

Sun, 25 May 2008

Bike The Drive 2008
Memorial Day weekend, so time for the annual Bike The Drive in Chicago. Got the whole family up bright and early, and was it ever nice -- 60-some degrees, sunny blue skies and no wind. Perfect conditions. And the Chicagoist blog has some pictures up.

/sports/cycling | permanent link

smtm bug fix release 1.6.10
A new version of smtm just went to Debian and CPAN. Perl 5.10 required a small change in how we test whether certain arrays do, or do not, contain elements. No other changes were made.

/computers/linux/debian/packages | permanent link

Thu, 22 May 2008

JPM Chase Corporate Challenge 2008
Just got back a little earlier from running the 2008 edition of the JP Morgan Chase Corporate Challenge. And again a record crowd of now just over 23000 in Chicago -- announced to be bigger than those at the JP Morgan Chase races in Boston, San Francisco or New York! This year the weather wasn't quite as stunning as it has often been in the past. But at least, temparatures in the high 40s and an overcast sky make for good running conditions.

This time, two colleagues and I tried to make it close enough to the starting line to not waste too much time 'surfing' around slower runners who for whatever reason think they have to be up at the front. And that seems to have worked: despite a still crowded start, I ran even, steady and fast enough to beat the PR from 2005 by a decent margin with a (hand-stopped) time of 20 minutes and 46.65 seconds. That yields 5:5619 min/mile (or for Christian, 3:4132 min/km) which seems too fast given the splits I saw at miles two and three. Oh well -- same cours as as the other five times that I've run this, so I trust the course is USATF certified.

And as always, good to hang with folks from work for a cold one or two afterwards. Given the temperatures, I didn't last very long though.

/sports/running | permanent link

Sat, 10 May 2008

Quarryman Challenge 2008
This morning was the 2008 edition of Quarryman Challenge, a 5km and 10mile race in Lemont, which is southwest of Chicago along the Illinois-Michigan Canal.

Three of us ran the 10 mile race, which was nicely organised. But is it ever friggin' hilly there: the race course takes three turns from the lower levels near the canal up towards those hills. As the elevation chart (that I cut out of this pdf file with the course map) shows, it is not so much the total elevation but rather how steep the incline is.

Quarryman Challenge elevation profile

That said, I did okay: even though the legs were really tired throughout from those inclines, I finished in 1:12:08 for a pace of 7:13. And given the reasonably small field, that yielded 34th place overall and third in my age group.

/sports/running | permanent link

Fri, 09 May 2008

On modes of transportation
Something I never really mentioned was the purchase of the foldable bike: a Dahon with a reasonably lightweight aluminum frame, a seven-speed hub and high-pressure tires. It's great fun in the city for the rides to and from the commuter train, or across downtown for occasional errrands after work.

I have had this foldable bike for nearly two years, and used it almost (work-)daily, even in the Chicago winters. 'Almost' because I did suffer from broken parts on a few occassions: a pedal broke (easy replacement), the axis in the front wheel broke (a good week for a new and inexpensive wheel) but the bummer was that a part of the frame-folding mechanism broke last fall. Given that the bike, which I bought used via craigslist, is a few years old, the part was no longer standard and so we waited for it to be shipped from the manufacturer. And waited and waited some more until Dan's decided to give me a matching part from a bike in their inventory. But apart from that episode, and the occassional problem with conductors on the Metra commuter trains, it has been a smooth ride. Highly recommended, and I do see a few more foldable bikes downtown.

Trek and Dahon bikes
But what is new now is that I finally gave in and bought a road bike, once again off craigslist. My daily commute is about ten miles one-way, which works out to about 35 to 40 minutes of cycling, plus a few minutes of locking/un-locking, changing, etc. I had used my trusted (yet heavier) touring bike with its steel frame a number of times, but felt that a road bike may make for a faster ride. While it saves a few minutes, it is not really a time saver as the bike-train-bike commute also takes around 40 to 50 minutes. That said, riding is simply a nice way to clear the head before or after work. I am back on the schedule I tried for a few weeks last summer / fall: running on Tuesday and Thursday leaves Monday, Wednesday and Friday for the bike commute. So far, I am 8 for 9 over the last three weeks. On the downside: one rather wet ride home, and already to minor flats that (luckily) still allowed riding home. The hardest part is meeting up with some other riders at 6:00am meaning that I am now getting up at 5:00am whether I am running or not. But all told a nice way to get some exercise in outside of running.

/sports/cycling | permanent link

Sun, 04 May 2008

On soccer, promises and hair cuts
Both my daughters have been playing soccer for a while now. And for another little while, I had been promising that if they ever scored three goals in a game, I'd shave my head.

As the attentive reader may have guess by now, that day finally came. This weekend saw a suburban tournament in nearby Oak Brook, and lo and behold Anna scored three goals in the first game! So home we went, out came the tool and she rather professionally separated me from my hair. So today on day two of the new look, a friend took this picture of me (scaled down from 2.4mb to around 80kb) at the same tournament:

Dirk on 4 May 2008 with a new look

They actually played just about the best soccer I have seen them play, won their group (with three shutouts!) and lost a hard-fought and well-played final 2:4. And today the weather even cooperated as one can see from the photo. Nice weekend, all told. And yes, the head feels kinda nice ;-)

/misc | permanent link

Sat, 03 May 2008

getopt support for littler
Practically ever since Jeff and I released littler to add easy scripting for R, questions popped up about how to support getopt-alike command-line parsing.

And as of today, a new package r-cran-getopt is in Debian. It provides Allen Day's recently released package getopt from CRAN which provides a new function getopt.

Given a suitable data structure that provides long and short-form command-line option names, whether arguments are mandatory, optional or not required (as for flags), and a data-type, getopt munges the command-line arguments supplied by the user and fills a new variable opt accordingly. If a fifth columns with help text is provided, a usage string can be generated as well.

Thanks to Allen for writing getopt, for accepting a quick two-line patch extending support from Rscript to littler, and for fixing one or two minor bugs. Thanks also to the Debian ftpmasters for adding r-cran-getopt within a few days.

/computers/linux/debian/packages | permanent link

Fri, 25 Apr 2008

Google Summer of Code 2008 projects assigned
As mentioned a few weeks ago, I had submitted two entries as possible topics for this year's version of the Google Summer of Code. And lo and behold, the response was very, very good and both applications were slotted!

For the topic 'create a PostgreSQL package for R that uses the standard DBI interface', a number of interested students contacted me, and a total of three applications were submitted. And while the R Foundation was only able to allocate four topics among a number of really good applications, Sameer Prayaga was our pick for this topic. It would be nice to fill this gap among the existing database connection methods for R, and I feel that Sameer can pull this off.

For the second topic of 'create a cran2deb tool for converting CRAN sources into Debian package' which I had submitted within Debian, Charles Blundell wrote an excellent application. In a way, this topic is a '2.0' version of our previous attempts of a 'top-down' set of tools in the pkg-bioc project on Alioth. This time, we will try something smaller, maybe more modular and lighter and see how far we get there if we try it 'bottom-up'.

And as we are currently in the community bonding phase, say Hi to Sameer or Charles when you come across them these days.

Lastly, I'd like to thank everybody who submitted an entry at Debian or R, or who contacted me about one of the topics I posted. The respone was very humbling, many of you were imminently qualified and seemingly very motivated -- but even Google's pockets can only pay for a finite number of projects. Sorry if yours did not get picked.

/computers/misc | permanent link

Thu, 24 Apr 2008

smtm maintenance release 1.6.9.1
A new version of smtm went to Debian yesterday. And after some pondering, I also just uploaded it to CPAN even though it only contains minor Debian packaging fixes rather than genuine code changes. In order to signal that the core Perl code hasn't changed, I simply added a new minor to move the version from 1.6.9 to 1.6.9.1.

/computers/linux/debian/packages | permanent link

Thu, 17 Apr 2008

Updated marathon pace geekery
A few months ago, I had shown a chart with per-5k-segment paces of my prior marathons runs (or at least those for which organisers had 5k splits). Since then, I ran two more: New York in November and London last Sunday. I have updated the charts to show both of these:

(updated pace comparison chart)
A few things stand out:
  • New York (in orange with small squares) clearly was my steadiest run, with very little variation and that nice burst at the end where the pace dropped below 7min/mile for the finish;
  • London (in yellow with open circles) was allright, but with a noticeable drop around kilometers 30, 35 and 40 -- but then the second-best finish.
  • I really cannot pace myself at the start: The two most recent races had the fastest start, including the hot and humid Chicago 2007 marathon (in light blue with circles)

In case anybody is interested (C'est bien toi, Christian, non?), the R script is available and I will be taking questions by email as R may not be obvious at first if you haven't used it.

/sports/running | permanent link

Tue, 15 Apr 2008

London Marathon 2008
Sunday was the 100th anniversary of the modern marathon: It was during the London 2008 Olympics that the marathon was run for the first time for the 26 miles 385 yard, or 42.185 km, distance. And it so happens that I was there to run the 2008 edition of the London Marathon.

Pretty nice weather at the start and finish: sunny, not too hot, occassional clouds. But being London, we still managed to get drenched for about 30 minutes. Overall, the conditions were good -- or else Martin Lel would have had a hard time for a new course record.

The course is pretty nice, right from the start in Greenwhich all the way to the spectacular finish in Westminster. Crowd support was good too, if only a little uneven. But the second half of the race, and particularly the last miles in the back from Canary Whard through the City to Westminster past Parliament and Buckingham Palace were awesome. Lots of people, lots of noise.

Oh, and I even saw the leaders in a group of five, including Lel and Hall as the course had parallel 'out' and 'back' tracks around miles 13 for me and 22 for them.

I had planned to take it easy and not try to run too hard, and aimed for a time of around 3:25 and then finished at 3:24:41. A little less 'even' than I had hoped, but still a very satisfying result. And for once the legs aren't all that shot afterwards :)

And the whole weekend was nice as I got to stay with friends in the southwest of London. Staying up late on both ends of the trip suppressed the jet lag fairly well. At least that's what I keep telling myself.

/sports/running | permanent link

Mon, 17 Mar 2008

Google Summer of Code 2008 projects are up
For the past few years, Google has been running their Summer of Code events where Google offers $5000 for college students who are asked to code for a summer for the benefit of various Open Source efforts.

And just like in 2006 and 2007, I put up proposals and offered to act as a mentor. The first one is up at both Debian and R: an opportunity to help with the ongoing efforts of 'turning more CRAN package into Debian packages'. The second one is only at the R page: a proposal to fill the missing link of DBI database interface modules with a matching one for PostgreSQL. More details for either idea are at the respective pages. Anybody interested should ping me by email.

/computers/misc | permanent link

2008 March Madness Half Marathon in Cary
Yesterday was the annual March Madness Half Marathon in Cary, IL -- and this year marked the 30th anniversary of the race. The race is getting more and more popular as the start to the running season, and an early race for those in Spring marathon training. This year, and it had sold out in a matter of days.

As for the race, I wasn't running it all that well. My legs already felt heavy when I was doing a casual four-miler the day before with our local running group. Similarly, I didn't feel all that loose yesterday. By mile four or five I was getting into a decent rhythm, and I was then running fairly steady 7:15s until about mile 11 when I ran out of gas and had to slow down. Final time was 1:36:38.15 -- not only several minutes slower than last year's but also slower than two years ago.

/sports/running | permanent link

Sat, 15 Mar 2008

SFJAZZ Collective at CSO
Went to the CSO yesterday as a nice way to end a frantic workweek: first a beer or two after work, and then off for some Jazz.

Yesterday's program was the SFJAZZ Collective: eight individuals, all noted in their own right, coming together for a few weeks each year to play as an ensemble. The program generally consists of two halfes: one with material by a modern composer -- Wayne Shorter is this year's pick -- and new original compositions by the band members.

This was a special treat as Wayne Shorter's compositions from the 1960s, both from the bands he lead and as a member of the legendary Miles Davis Quintet, have always been some of my most favourite modern pieces. At the same time, it gave me a chance to finally see Joe Lovano on ts and Stefon Harris on vb. Other band members were equally impressive: Dave Douglas tp, Miguel Zenon as, Robin Eubanks tb, Renee Rosnes p, Matt Penman b, Eric Harland dr. Favourite new composition of the night: 'Angel's Shares' by Penman.

All in all a nice evening out to cap off a busy week.

/music/jazz/live | permanent link

Mon, 10 Mar 2008

PGApack 1.1: Almost as good as new
PGAPack is rather nice and fairly small library for 'parallel' optimisation via generic algorithms using the MPI message passing protocol. PGAPack 1.0 was written by David Levine while doing graduate work during the mid-1990s at Argonne Labs / University of Chicago.

PGAPack has also been in or around Debian for a rather long time, but it suffered from benign neglect in the last few years. Some of this came to the fore in this bugreport which lead to my offer to the then-maintainer Andreas to help on the relicencing request. After all, Argonne Labs is just a few miles from where I live, and I had already spent a little bit of time polishing and upgrading the package for my own exploratory use.

So I called Rusty Lust, head of the Mathematics and Computer Science section at Argonne to try to sort this out. He was sympathetic and put me in email contact with David Levine. As we are all somewhat busy, this dragged on for a little longer than we thought --- but as of today, about and a half years later, we have a new and shiny PGAPack 1.1 release, or around twelve years after the initial 1.0 version came out.

I have done a fair amount of polishing: there are now two library packages for serial use (i.e. for debugging) as well as parallel use via MPI. We use Open MPI where available and LAM where not. All open Debian bugs have been addressed. One minor issue in the postscript documentation remains as David can no longer locate his LaTeX sources; I may just have to extract the text and re-latex this from scratch to update it. One day.

Anyway, for full reference, the changelog entry is below. The package is currently in the NEW queue (as the new sub-package require manual inspection and approval) but should hit mirrors in a couple of days.

My thanks to the two previous Debian maintainers; to Rusty Lusk for helping with the from the end MSC department at Argonne Labs and for suggesting the rather liberal and easy MPICH2 license (and he happens to be one of the MPICH2 authors); and of course to David Levine for writing PGAPack in the first place, for agreeing to relicense it and giving valuable feedback on my repackaging of what is now version 1.1 on the MCS ftp server at Argonne --- this library has held up really well over the years; let's hope it will find more good use going forward.

pgapack (1.1-1) unstable; urgency=low

  * Really good news:  The MCS divsion of Argonne National Laboratories has
    agreed to relicense pgapack using the MPICH2 license. So pgapack
    is now Free Software and can move into Debian's main archive!
  
    Our thanks go to David Levine and Rusty Lusk to make this possible.
  
  * New maintainer, following Andreas' offer dated 2006-10-04 in #379388

  * debian/control: Change section to math		(Closes: #379388)
  
  * Added new brinary packages libpgapack-mpi1 and libpgapack-serial1
  * The MPI package is configured using Open MPI where available and LAM
    where not. 
  * debian/control: Changed Build-Depends: to use OpenMPI where available, 
    and LAM otherwise.
 
  * Finally acknowledges old NMUs 		(Closes: #379168,#359549)

  * source/integer.c: Apply patch for one-off error 	(Closes: #333381)

  * source/report.c:  Do not unconditionally print at generation 1
  
  * debian/rules: Remove a bashism 			(Closes: #379168)
  * debian/rules: Install examples directly 		(Closes: #134331)
  * debian/control: libpgapack-lam1 Depends on lam4 	(Closes: #60376)
  * debian/rules: Rewritten using debhelper
  * debian/control: Added Build-Depends: section for debhelper

  * No longer install mpi.h in /usr/include		(Closes: #404027)
    
  * debian/control: Updated Standards-Version: to current version

  * man/man1/PGAGetCharacterAllele.1: fix whatis entry 		(lintian)
  
 -- Dirk Eddelbuettel   Mon, 10 Mar 2008 18:03:34 -0500

/computers/linux/debian/packages | permanent link

Thu, 21 Feb 2008

Time flies ...
... when you're having fun. This blog turned five a few days ago.

/computers/www/blogging | permanent link

Sat, 16 Feb 2008

CRANberries updated
The good folks at CRAN, the R package network which by now contains over thirteen hundred packages, have reorganised the website slightly. As far as I can tell, the changes are all for the better: nicer URLs, lots more per-package information, and other changes such as an updated 'task view' format (which is the part I knew via my maintenance of the Finance task view).

But these changes also affected my my CRANberries (see the html or better yet rss view) summaries of new packages as some of the source information moved. So I just updated the (surprisingly short at 189 lines including plenty of whitespace and comments) script, and things should work now come the next update.

While updating the 'more info' link for new and updates posts to point to the new-style entry at CRAN, I also took the opportunity to update the format of the `blog' entry for updates where we now show title and description along with the diffstat output,

I also manually copied in two of the recent entries: the new package emu where CRANberries had fallen over as we could not find the package description (in the new spot), and the existing package GEOmap where diffstat failed as we somehow didn't have a proper tarnall of the previous sources.

/computers/R | permanent link

Fri, 15 Feb 2008

Nice one
A post entitled Saved by SPAM just appeared on the Journal's website. Yes, it's only barely ironic, but heck, my blog is geeky enough for a lame joke about spam and it is Friday afternoon before a long weekend ...

/otherblogs | permanent link

Mon, 14 Jan 2008

littler 0.1.0 released
I've just rolled up release 0.1.0 of r (pronounced littler). A few changes made it into this release:

  • A new option -l (with long form --packages) to load packages into the R session. This is useful when you have data-processing one-liners as it does away with the explicit library(foo) inside your actual R expression. For added bonus, library() is wrapped in a suppressMessage() call.
  • The --no-restore option is now passed to the embedded R session.
  • Argument handling was improved / corrected for a corner case.
  • Output shown by --help was improved.
  • The manual page and README were updated accordingly.
  • The datasets package will now be autoloaded.
  • Two minor fixes went into bootstrap, the wrapper around autotools and friends.
  • The cache-clearing part of the R package updater example update.r was improved.
  • We added some comparisons to Rscript in the timing tests.

As usual, our code in our svn archive, on my r page, and in the local directory here. A fresh package is in Debian's incoming queue, and Jeff's littler page at Vanderbilt should reflect the new release soon too.

/computers/linux/debian/packages | permanent link

Sun, 06 Jan 2008

NAS'ed
I had been eyeing an inexpensive network-storage device -- for the non-geeks: think of a hard disk with an ethernet port and some controlling software -- for some time. I was aware of some of the hacking efforts around a few of these but somehow nothing really appealed. I was sort-of looking for nslu2-with-a-disk, and preferably not too expensive.

Lo and behold, that's what I saw today in my techbargains feed: a Buffalo LinkStation Live which contains a 500gb SATA for $199 after rebates. Some quick googling lead to these wiki pages which looked promising: anything from enhancing the stock Linux setup by enabling a few more services to a custom Linux distro (similar to my wrt54 router running linux) to reportedly some work-in-progress for a native Debian installation. Nice!

So off I went, ordered the thingie for local pickup for an additional 5% off and picked it up a little later at the local Circuit City (where visits are seemingly a recurring event these days). The documentation is very brief and insist that you install something on Windows -- just to find that the little box autoconfigures itself just fine. Presumably some network discovery is going to find the assigned dhcp address which is needed for the web interface. A few minutes later, a new (fixed) IP address was assigned, ntp was enabled and that was about that.

After dinner, I quickly followed this tutorial to get the box a bit more Unixy without going too far (yet) from the default: start up telnet via a simple Java command-line tool, login, then enable ssh, set it up in /etc/init.d, add some extra binaries. All very quick and simple [ with the caveat that the addons.tar didn't want to get there via the Java tool, so manual scp once 'inside' did the trick ].

NFS, which I like for shuffling files around, appears to be little trickier for this ARM-based LinkStation Live. So at least for now I am content with simple rsync'ing of my backup directories on the few machines here. Much better than the current setup with mutual backups between workstations and semi-permanently being out of space.

All in all a rather pleasant gadget and recommended at the price. The extra $100 in rebates are valid from today to the 12th.

/computers/hardware | permanent link

Sat, 05 Jan 2008

RQuantLib 0.2.8
Version 0.9.0 of QuantLib was released a couple of days ago on Christmas Eve; as usual, Debian packages for QuantLib were uploaded right away following a few earlier pre-releases.

As QuantLib is approaching its 1.0 release, a few API changes requires updates to basically all of RQuantLib's C++ source files. Luckily most changes were minor. At the same time, we also generalised the Binary (aks Digital) option pricer to allow for a 'binType' argument (with values 'cash', 'asset' or 'gap' for CashOrNothing, AssetOrNothing or Gap digitals) as well as an 'excType' argument to switch between European and American exercise. Dominick made a small change to the DiscountCurve object to seamlessly pass a switch variable indicating whether we have 'flat' curves or not.

Another change was the addition of formal unit tests using the RUnit from CRAN (which we happened to have added to Debian recently in the wave of new RMetrics packages). We use the scheme initially proposed by Gregor Gojanc and extended by Martin Maechler for RMetrics that allows the unit tests to be run from both the source and the installed package which is nice. As QuantLib itself has a massive amount of unit tests in its code; I am hoping to add more and more of those into RQuantLib itself as we add more functionality.

On that front, more exciting news: RQuantLib is now hosted on R-Forge. Potential contributors are encouraged to register at R-Forge and to get in touch -- this is a great way to learn about combining C++ and R.

To wrap up, the new version 0.2.8 is currently in the queue at R's master CRAN host and should hit the CRAN mirrors shortly; likewise the Debian package has been uploaded and should also propagate to Debian mirrors in due course. As usual, source are also available locally on my site.

/computers/linux/debian/packages | permanent link

Thu, 27 Dec 2007

Internet NON-Service Provider: Yet another Comcast saga
About a week ago, our internet connection appears to have dropped around midnight. Not a biggie, one thinks. So on Friday morning between shower, breakfast, and leaving for work, I power-cycled the cable modem and router a few times which 'usually' helps.

Not this time. Still no signal by the afternoon, and when Lisa called the help line, they confirmed that they could not see our cable modem. That could have given it away, but I didn't click.

This being this time of year, we were actually out overnight on Friday so that I couldn't get to inspect matters at home. Also, friends and neighbours were out the next day so I couldn't get my hands on another cable modem to see if it was the line (my suspicion at the time) or the modem. All I could do was call, go once more over all possibilities with the tier-1 help person -- and schedule a technician to swing by on Monday afternoon, i.e. on Christmas Eve, or about 48 hours later (!!). So I made do over the weekend with two trips to the local library to consume some of their wireless signal to catch up on things.

The big surprise came on Monday. The technician, was on time and rather friendly and knowledgeable, checked the signal strength at the box outside, and on two cable outlets in the house. All fairly well. So during the second call to Comcast, we turned our attention to the cable modem. A few years ago I returned the 'leased' modem and bought an inexpensive 3com cable modem. Only after checking that it was supported, of course.

Well now it seems that Comcast decided that this (old) modem can only talk Docsys 1.0. And instead of telling me in advance, they just fscking dropped it cold. Unbefriggable. I must be getting two fliers a month informing me how great Comcast's so-called (and IMHO rather overpriced) 'Triple Play' is. You'd think that they use that mail-out infrastructure to let me know about the service change. Or use email, after all they are my ISP. Naaah. Rather just drop the service cold right before the holidays. That's the spirit.

To clarify and repeat, I do not mind service updates. I do not mind improving standards and improved throughput. And as I am quite happy to buy a new modem on the spot on Tuesday afternoon -- yes, Christmas eve, because I then have nothing better to do than to troll in the mall to buy a new Motorola cable modem at full retail cost rather than somewhat more cheaply at Amazon or other places -- I could easily have done better if only they had told me in advance. I could go and use some choice terms , but as we're still in the holiday season I better stop... Maybe I should just go back to DSL and save a few bucks.

/computers/broadband | permanent link

Tue, 20 Nov 2007

Several new Rmetrics packages
Debian has provided Rmetrics packages for financial engineering and computational finance since the Rmetrics release for R 1.9.0 in the summer of 2004. Over the years, Rmetrics has gotten more granular and changed from a handful of packages to two handfuls --- and the most recent release extended this trend even further to almost two dozen packages as shown in this chart.

Dependency chart for Rmetrics packages

Rmetrics now comprosises over twenty individual packages. Eleven new packages were added in the 260.72 release for R 2.6.0, and they required eight other new packages from CRAN. While I would have preferred a more spread-out approach than the shotgun approach of having to introduce all these new packages at once (which took the last four weeks), I am in support of the reorganisation which should make maintenance more easy going forward.

So to get all of these packages onto a Debian box, a quick sudo apt-get install r-cran-rmetrics is all it takes. Currently supported only in the always-fresh unstable flavour, but hopefully soon in testing too.

A big Thank You goes to the Debian FTPmasters. Of the 20-some packages that I added to Debian during this Rmetrics expansion, many were added within a day or two.

Lastly, thanks also to Florian Hahne, Robert Gentleman and Elijah Wright for much appreciated help with R's Rgraphviz and graph packages to create the chart above. It only takes a handful of lines to create the basic graph, and another few lines for the colours and titles. The code is available on request, of course, but you need the current development versions of the BioConductor packages Rgraphviz and graph (which are not in Debian yet).

/computers/linux/debian/packages | permanent link

Tue, 06 Nov 2007

New York Marathon 2007
I was in New York last weekend to run the 37th New York Marathon which reportedly currently stands as the largest marathon ever run in terms of starters and finishers. And it couldn't have happened on a nicer day: sunny at the start with temperatures in the 50s, no wind, and some clouds in the second half prevented it from getting too hot.

Large crowds at most parts of the course, a decent number of bands, and a generally very excited atmosphere. And of course a nice course across the five buroughs finishing in Central Park.

For once, I managed to run the race steadily and yet fairly fast, ending up with a time of 3:18:47 (and thus a 7:35 pace). This is about a minute slower than my PR from Sunburst 2006, and just two seconds faster than my best Chicago Marathon result from 2006, yet much better than this year's times from Boston 2007 where it was too cold, and Chicago 2007 where it was way too hot. Given that the NY course is somewhat hillier, and that was definitely busier and more crowded than the other races, I'm quite happy with the time, and the way I ran, getting through without any walking breaks. Not quite negative splits at around 1:38 and 1:40 for the two halfes. With enough energy left at the end, I finished the remaining 2+ km after the 40 km with a sub-7:00 min/mile pace which felt great. And it is a nice feeling to have completed the Boston, Chicago and now New York marathons in the same year.

Last but not least, the weekend was a general blast as I was staying with a friend I hadn't seen in a decade which made for lots of stories, and even more beers...

/sports/running | permanent link

Sun, 21 Oct 2007

Frank Lloyd Wright Race 2007
Today was the 31th annual Frank Lloyd Wright Race in Oak Park, next door to where I live. This is a really nice small race I've run in 2006, 2004 and 2003.

Conditions today were almost ideal -- we're having unseasonally warm weather, and it was sunny and in the high 60s without any noticeable winds around the start time making it feel like a nice late summer morning, rather than the middle of fall. I wasn't aiming for something really fast as I am still in training for a longer race and had done some reasonably fast one-mile laps yesterday morning leaving me not exactly well rested. But for once, I actually managed to run a steady pace, and then ended up within seconds of last year's PR for the distance at 41:15 for a pace of around 6:39, so I'm quite pleased.

/sports/running | permanent link

Mon, 15 Oct 2007

Some marathon data analysis
A few days after the Chicago Marathon in record-heat, I was wondering just how much more I had been slowing down, and exactly when during the course of the race this occurred.

Bigger races like Chicago provide the so-called 'splits' for every five kilometer segment. Helpfully, they also keep these with the archived results which are all accessible via the web. So it was easy enough to collect a sequence of time such as '0:22:34', '0:45:30', ... and so on. And thanks to R, my data slicing and dicing and visualizing tool of choice, it was just a handful of lines to produce the chart below.

The chart covers my four Chicago marathons (2004, 2005, 2006 and 2007) as well as Boston (2007) --- the infamous '27.2 mile marathon' seems to have dropped off the net, and the smaller Sunburst does not have 5k split times.

(pace comparison chart)
The chart shows a few things quite nicely:
  • The first marathon (dark blue squares) shows a marked slowdown after km 25.
  • The following year (medium blue circles) a similar yet more moderate slowdown appeared 5 km later.
  • The next year (blue triangles) the pace was remarkably stable without noticeable slowdowsn -- yielding the 'BQ' time.
  • The Boston race (smaller green circles) was more prudent due to weather conditions, and a lingering cold, and I managed to finish much stronger than I ran the middle part.
  • Chicago this year (orange circles) was a different story: too fast at the beginning, a cautious middle similar to the Boston race but two fairly significant slowdowns after 25 and 35 km, followed by some acceleration right before the finish.

While it was a tough race, I clearly ran a lot slower than previously, in particular between 25 km and 40 km. So there.

/sports/running | permanent link

Sun, 14 Oct 2007

Tunes
Following Adam's suggestion about Pandora, I spent a the rest of the afternoon, evening and this morning listening to music old and new. Pandora's suggestions around Madeleine Peyroux were particularly nice as they lead for example to Sam Philipps which was new to me. I can already see this leading to increased spending on new cds... Recommended.

/otherblogs | permanent link

Sat, 13 Oct 2007

Makin' Tracks 2007
This morning was the annual Makin' Tracks 5k. As this starts three blocks from my house, supports the track at Concordia, a small college up the street from me, and runs right through my immediate neighbourhood, it's hard not to participate. This year was of course not even a week after the somewhat eventful Chicago Marathon in record-heat so I wasn't sure how well I'd do. Once again, I started way too fast and ran two miles at a 6:00 min/mile pace that I couldn't sustain just a week after a (arguably slow but still full distance) marathon, so I slowed down and finished at 19:41.7 or so for a pace of 6:23, or about eight seconds slower than my best time from last year (when we ran before the marathon). Onto the Frank Lloyd Wright 5k/10k next week, then!

/sports/running | permanent link

Tue, 09 Oct 2007

Chicago Marathon 2007
Last Sunday, the 30th Chicago Marathon took place. This was meant to be the year to make this marathon even bigger: the field had increased substantially to 45,000 registrations yet it still sold out way earlier than in previous years.

As it turns out, the weather did have its own surprises in store. Earlier in the preceding week, the forecast changed from overcast and rainy to sunny. And sunny it was. While we had a bit of cloud cover at the start at 8:00am, temperatures were already in the 70s and keep increasing. This year's marathon is now on the books as the hottest ever: the clouds dissipated and it was a scorcher.

Needless to say, it was a rather challenging race. I finished in 3:41:39, which is my slowest time by some margin for the by now seven marathons I've ran. It was not the day for running fast. Of course, I didn't quite grok that at the start and ran ten miles reasonably hard in a quick pace, but then paid for it, and then paid some more. That said, apparently around 10,000 registered entrants didn't even start, and another 10,000 did not make it to the finish. With the heat, several hundred were treated by the medical teams. Worse still, one 35-year old runner from Michigan died (though the autopsy claims he had a heart condition; other reports say that alone cannot have been lethal). The race itself was aborted, and those who had not reached the half-way point by 12:00am were diverted to the finish and urged to walk rather than run.

According to the (by now fairly extensive) news coverage, this whole experience left quite a few people mad and bewildered. As of today, a good 48 hours after the race, the City seems to be in some sort of crisis management mode to prevent damage for the oh-so-important bid for the 2016 Olympics.

For some flavour of the news coverage, see an early report, some stunning pictures and some suggestions to prevent another one like this, all from the Chicagoist blog.

/sports/running | permanent link

Wed, 03 Oct 2007

Beancounter minor bug fix release 0.8.8
For some odd reason, the trusted Perl module Date::Manip now wants the "approx" argument to format date differences. Adding this in the five calls across beancounter and BeanCounter.pm is the main change in version 0.8.8 which is now in Debian's incoming area, uploaded to CPAN and onto the beancounter page here.

/computers/linux/debian/packages | permanent link

Wed, 26 Sep 2007

Dear Samsung,
Your ML-3051N is a lovely little 'network printer. Hopefully, it will serve us well in the next few years as a replacement for the thirteen year old HP Laserjet ML we bought (for about five times the price: a native postscript printer was rather expensive in 1994) a long time ago. And you even mention that it can talk to Linux! But still, I have a few things to talk to you about:

  1. as you mention Linux, it's a tad odd that you enumerate various flavours (RH 8 to 9; FC 1 to 3; Mandrake 9 to 10.2; SuSE 8.2 to 9.2) but happen to exclude the two variants that run around here: Debian and its cousin Ubuntu. Besides, the versions of those other flavours are a tad old, no?
  2. as you ship Linux software on a cdrom, it is even odder that the install shell script fails (note: spaces between #! and /bin/sh may not be a good idea). Fails not once, but twice: you say /bin/sh, but you meant /bin/bash as you use features of the latter. Oh well, I guess we all made that mistake in our youth. Irregardless, the software never installed. Which is a pity as this would be about the first 'household appliance' I bought with native Linux support. So close, and yet so far.
  3. the control panel is a cumbersome way to set a static IP and to turn the odd protocol off.
  4. did I really have to install all this windoze stuff on my wife's laptop just to learn that the printer does in fact have a nice web interface? You could have mentioned that with Linux info, no? It is in fact a rather nice and complete web interface.
  5. as you supply networking support, do you really have to enable every possible protocol under the sun? I mean slp, snmp, multicast dns, dynamic dns, raw tcp, ipp, ethertalk, netware ...? I still see my cable modem go down 'seemingly randomly' rigth after I print even though I now disabled just about everything but raw tcp to port 9100 as well as ipp to 613. Still, dropping the cable connection just because the printer wakes up is odd, isn't it?
  6. Cups, foomatic and all the other printing goodies do not yet know the ML-3050 family, so we are making do with the ML-2151NPS settings for postscript. Works fine, so I won't bother copying the ppd file from the cdrom to the handful of computers around here.
Kind regards, Dirk

Updated to fix a markup error. And the shebang space is standard, I am told.

Postscriptum: Turns out it was 'just' the power surge. Putting the cable modem and the router onto a different wall outlet, and a surge-protector and battery backup ups to boot, fixed the issue.

/computers/hardware | permanent link

Mon, 10 Sep 2007

Overdue smtm bug fix releases 1.6.9
Some time ago, Yahoo! Finance made some changes at their backend for the provisioning of png images which are also displayed by smtm. Version 1.6.9, which just went to Debian's master site and will go to CPAN in a moment, finally updates the code for that. My apologies for taking so long, and thanks to Michael Kurlin, Clinton R Lambe, Ian Seow and possibly others whose mail I haven't kept who sent me patches and heads-ups.

/computers/linux/debian/packages | permanent link

Another option for MythTV schedules
Yesterday I mentioned how straightforward it was to setup an account with SchedulesDirect to obtain tv schedule data for the MythTV 'software pvr'.

For completeness, a 'free as in beer' screen-scraping alternative is provided by the zap2xml Perl script together with a (free) registration at the tvlistings.zap2it.com site. Instructions are on the zap2xml page, and all is left to do is to combine the calls to zap2xml.pl and mythfilldatabase in a single shell script -- which is easy enough to do too.

/computers/linux/mythtv | permanent link

Sun, 09 Sep 2007

MythTV data service switch
One of several things I never got around to blog about is my experience with MythTV (which, in case you don't know, turns your pc into a tivo, given a suitable tv decoder card and the mythtv software). To make up for the lack of a more detailed post, let's just recap: I had set it up a little over two years ago, on what was then a 'possibly temporary' Kubuntu box. Well the box is still there and still running Kubuntu, four releases later, and MythTV only got easier as Ubuntu now includes the decoder drivers. So far no worries.

TV schedule information had always been provided via the 'free as in beer if you answer a question or two' zap2it subsidiary of the for-profit Tribune publishing company (i.e. the folks who own the Chicago Tribune, Cubs baseball team, WGN TV station around here and what have you, and are in the process of being sold to Sam Zeil). As had been noted in other places, that 'free' service is no more, and a seemingly cooperatively-run 'small fee' service has been set up by a few volunteers at schedulesdirect.

So I just switched, and for the benefit of anybody sitting on the fence about this, the switch is trivial:

  1. If need be, add 'feisty-proposed' to apt's sources.list file to get the new MythTV 0.20.2 release directly from Ubuntu.
  2. Install MythTV 0.20.2 which takes as little as wajig update; wajig dist-upgrade
  3. Follow the instructions in this email, in particular point 4) about re-creating lineup info
  4. sit back and enjoy
Including the signup at schedulesdirect, the whole thing took maybe ten minutes. Not bad. Now I only have to find some spare time so that the backlog of unwatched episodes of the Daily Show does not grow much beyond six months... Oh, and last but not least, I'll have to pony up the fifteen bucks for the subscription once my trial is up next Sunday.

/computers/linux/mythtv | permanent link

Chicago Half Marathon 2007
This morning was this year's Chicago Half Marathon, the race I often refer to as 'my favourite' after having participated in 2006, 2005, 2004 and 2003. But this year I somehow managed to convince myself I had already registered, when I had not. Doh. But due to a friend's unlucky injury, I could re-use his registration and still race. Weather was once again gorgeous, but maybe a little on the hot side. I tried not to push myself too hard given the two big races coming up in the next seven weeks and ran a decent 7:12 min/mile pace for a chip time of 1:34:20.

/sports/running | permanent link

Sun, 02 Sep 2007

Another Herbie Hancock concert
Herbie Hancock was in town to open this year's Chicago Jazz Festival with a concert at the CSO last Thursday. This time he came with a mostly-electric setup featuring Nathan East on bass, Vinnie Colaiuta on drums and Lionel Loueke on guitar. I found it at times a little heavy on the synthesizers, but it was evident that Hancock and the band enjoyed themselves during the two hour set. A nice concert, all in all, and again very different from his last two concerts.

/music/jazz/live | permanent link

Sat, 11 Aug 2007

The amazing Prof. Ripley (cont'ed)
A little mini-meme got started on August 1 when Ben Bolker posted the following code to the r-devel list (and here I substituted the more standard '<-' assignment operator for the less standard though-now permitted '='):

x <- readLines("http://developer.r-project.org/R.svnlog.2007")
rx <- x[grep("^r",x)]
who <- gsub(" ","",sapply(strsplit(rx,"\\|"),"[",2))
twho <- table(who)
twho["ripley"]/sum(twho)
In five lines (that could be shortened to three at the expense of some readibility), the SVN log for R is downloaded directly from the website, the revision authors are extraced and then tabulated by submitter. The relative percentage of Brian Ripley is found to be a staggering 74.8% -- or about three times as much as the other fifteen committers combined. Smokes.

[ Oh, and for those who don't know him, he's also got a day job which presumably entails looking after his graduate students at Oxford. Who knows, he may even teach. Kidding aside, he's actually one of the nicest persons you'll ever meet in real life. ]

Now yesterday, Simon Jackman who had at first simply repeated Ben's analysis on his own blog followed up with a nice analysis (albeit typeset in a way that rendered the code inoperational, which has now been fixes) that creates both a histogram and a dotplot of commits per hour of the day. Omitting Ben's code which Simon reuses, we have the following for histogram and dotchart:

tod <- unlist(sapply(rx,function(x)strsplit(x,split=" ")[[1]][6]))
tod <- tod[who=="ripley"]

tz <- sub(pattern=".*(-[0-9]{4}).*",replacement="\\1",x=rx)
tz <- tz[who=="ripley"]
tz <- as.numeric(tz)/100
offset <- 3600*tz

z <- strptime(tod,format="%H:%M:%S")
hist(z,"hours",main="Ripley Commit Times in SVN TZ")

h <- z - offset
h <- format(h,format="%H")
h <- factor(as.numeric(h), levels=0:23)
dotchart(table(h), main="Ripley Commit Times, By Hour in GMT",
         labels=paste(0:23,1:24,sep=":"))
This extracts the commit times, subsets to the ones by Prof. Ripley, extracts the timezones component (as strptime seemingly doesn't do that which is a pain), extracts the tz-less time via strptime into a variable 'z' for which the histogram is drawn. He then corrects the times by the tz offset expressed in seconds, formats is as hour of the day and turns it into a 'factor' (an R data type for qualitative variables which may be ordered as is the case here) and draws a dotplot. This results in the following chart:

Simon Jackman's per-hour charts of Brian Ripley's commit patterns

Now, nobody has looked at the time series. So we correct this and add the following:

## rather extract both  date and time
dat <- unlist(sapply(rx, function(x) {
  txt <- strsplit(x,split=" ")[[1]]
  paste(txt[5], txt[6])
}))
## subset on Prof Ripley
dat <- dat[who == "ripley"]
## and convert to POSIXct, correcting by tz as well
datpt <- as.POSIXct(strptime(dat,format="%Y-%m-%d %H:%M:%S")) - offset

## turn into zoo -- we use a constant series of ones as each
## committ is taken as a timestamped event
datzoo <- zoo(1, order.by=datpt)
## and use zoo to aggregate into commits per date
daily <- aggregate(datzoo, as.Date(index(datzoo)), sum)

## now plot as grey bars
plot(daily, col='darkgrey', type='h', lwd=2,
     ylab="Nb of SVN commits, three-week median",
     xlab="R release dates 2.5.0 and 2.5.1 shown in orange",
     main="The amazing Prof. Ripley")
## mark the two R releases of 2007
abline(v=c(as.Date("2007-04-24"),as.Date("2007-06-28")),col='orange',lwd=1.5)
## and do a quick centered rolling median
lines(rollmedian(daily, 21, align="center"), lwd=3)
This extracts both date and time, creates a proper R time object (a so-called POSIXct type) from it, fills a zoo ('the' magic class for time series) object with it, uses zoo to aggregate commits per day and plots those in a barchart-alike (I know, I know, ...) plot to which we add the two releases as well as a rolling and centered three-week median (as a real quick hack rather than a proper smooth).

Timeseries of Brian Ripley's commit patterns

This shows that Prof Ripley averaged about ten commits a day before and after the release of R 2.5.0, and that he has slowed down ever so slightly since then to end up at around a mere seven commits a day. Every day. For the seven-plus months we looked at.

So, anyone for analysing his r-help posting frequencies ?

/computers/R | permanent link

UseR! 2007: Two talks and a new R package 'RDieHarder'
The first UseR! conference in North America ended yesterday. I gave two talks and updated my presentations page accordingly.

One talk was joint work with Steffen Moeller (who had also presented our work in Italy in June, and I added that presentation too), David Vernazobres and Albrecht Gebhard and concerns automated building of around two thousand (!!) new Debian source packages for all CRAN and BioConductor packages for GNU R. I plan to send something to debian-devel on that in a day or two as well because the time is right for some feedback on this.

The other talk was on about RDieHarder. This is joint work with Robert G. Brown and uses his DieHarder library for random number testing (that I've added to Debian a few months back). It allows R to both runs these tests, and to further analyse and visualize the test results. I finally uploaded RDieHarder to CRAN a few days ago -- in fact, my CRANberries rss feed of new CRAN packages had it show up the morning of the presentation. And now that I've added a webpage about RDieHarder I can finally say it's been released.

/misc | permanent link

Sat, 21 Jul 2007

Dead disks, and lvm woes
As posted earlier on the rather recently announced CRANberries feed (that is generated and hosted on my box) covering CRAN package updates, my main server bit the dust. The hard disk no longer wants to talk to any of its lvm partitions, leaving me without /usr, /var/, /var/local/, /home, /srv. Fortunately, most of this was backed-up but now I have to go through reconfiguring the replacement machine I already set up. Not fun.

If anybody has tips on recovering the lvm partitions, I'm all ears. /etc/lvm/ seems fine if that helps as a starting point.

/computers/misc | permanent link

Mon, 09 Jul 2007

Announcing CRANberries
Earlier today I sent an announcement to the r-packages list. It describes CRANberries, two simple RSS feeds that summarize both 'new' and 'updated' packages at CRAN, the archive network for R. I cooked this up rather quickly using a few lines of R, a small SQLite db backend and the old Blosxom blog engine. A tip of the hat to Barry Rowlingson who almost immediately suggested to use the lol format instead.

The hope is that this proves helpful for keeping tabs on the amazing growth of CRAN (which is now at over one thousand packages) as well as the number of updates to existing packages. The feed(s) can be consumed standalone, or via the brand new Planet R aggregator that Elijah announced today too.

/computers/R | permanent link

Mon, 02 Jul 2007

More on 'nicer charts'
Via the Planet Debian aggregator and his blog, Sven followed up on my post regarding Lucas' plot of the package age distribution.

As some of my points didn't seem to make it across, I will reiterate them more plainly:

  • GNUplot, while easy to use, creates charts that aren't terribly pretty;
  • Lucas' original chart had, to paraphrase an expression by Tufte, a poor 'ink to paper ratio': the data is too concentrated in the last quartile;
  • for that very reason, taking logs is a good thing here

Sven also addresses the fact that what we really want is to see the quantiles of the data set. Quite right, and taking logs makes that easier. Consider the two charts below which plot the 'package age in days' as an empirical cumulative distribution function using built-in R functions ecdf and plot.stepfun (rather than redoing it ad-hoc as I had done), and also add explicitly quantiles. The two charts use the exact same instructions; however the second chart transforms the x-axis to a logarithmic scale.

Debian Package Age-since-recompile Distributions charted two ways

While it is close to impossible to find the 25 or 50 percentile on the first chart, it becomes a lot easier on the second chart because the x-axis is 'stretched' using the log transform. About one quarters of the distribution appears to be rebuild within 1.5 months old, and about half is younger than four months (as a quick call to summary(pkgAge) confirms). Reading these proprtions off the original chart, or the non-log chart, is much more difficult.

/computers/R | permanent link

Fri, 29 Jun 2007

Improving simple charts
Earlier today and via Planet Debian, Lucas blogged about the 'age distribution' of Debian packages, defined as the time since the last (re-)compilation. He illustrated his findings with an, umm, rather ugly chart. Having climbed onto the soap box once before, I would like to point out how easy it can be to create simple, informative, and, at to least to me, prettier charts using R.

Lucas included a URL to the data. The first nice thing to note that we can read the data directly from the URL -- no need to copy the file:

pkgAge <- read.table(file="http://people.debian.org/~lucas/arch-age/arch-age.log", col.names=c("pkg","yyyymmdd"))
read the data into a data.frame which we have given two column names.
pkgAge[,"date"] <- as.Date(as.character(pkgAge[,"yyyymmdd"]), "%Y%m%d")
pkgAge[,"age"] <- as.numeric(difftime(Sys.Date(), pkgAge[,"date"], units="day"))
pkgAge[,"prop"] <- (1:nrow(pkgAge)) / nrow(pkgAge) * 100
We then create three new columns. First is a date, by parsing the (integer) dates (after first casting them into characters) by supplying the format in standard C notation: "%Y%m%d" for year, date and month without any separators or formatters. Now, having the date as an actual date object inside a real data analysis language we can do things as e.g. computing date differences. The difftime function does just that, using the current date as other point. We ask for the return to be in days, and cast this down to a purely numeric vector (instead of datediff object). Lastly, we quickly compute the date proportion in percentages.

We can then view the date. Before we plot,

png("packageAges.png", quality=100, width=640, height=480, pointsize=10)
oldpar <- par(mfrow=c(2,2), mar=c(2.5,2.5,3,1))
we direct the charts to a png file of given dimensions, and ask for all plots in one figure (using mfrow with two rows by two) with somewhat smaller figure margins using the mar argument to par.

The first chart shows again proportion over date:

with(pkgAge, plot(date, prop, type='l', main="Standard Plot"))
(The with() function simply allows us to refer to the columns by their names without explicit subsetting. plot(pkgAge[,"date",], pkgAge[,"prop"]) is equivalent, but more cumbersome.)

As it clear that the data has a fairly long tail in the older dates, we can also try to plot the plot over logarithmic time differences. This doesn't work for dates, but it works for our (positive-valued) age variable:

with(pkgAge, plot(age, prop, type='l', log="x", main="More linear as log(age in days)"))

The very far left tail below 0.5 percent is interesting as the one very old package is clearly an outlier within an outlier region. We use the subset function to take just one portion of the data, use logs, and explicit plotting symbols '+' in a points-and-lines plot:

with(subset(pkgAge, prop<0.5), plot(date, prop, type='b', log="y", pch="+", main="Detail in left tail, up to 0.5%"))

Lastly, the upper quartile is fairly linear.

with(subset(pkgAge, prop>75), plot(date, prop, type='l', pch=".", main="Yet fairly linear in top 25%"))

At the end

oldpar <- par(mfrow=c(2,3))
dev.off()
we restore the graphics paramters and close the device (here the file). All this then yields the following chart:

Debian Package Age-since-recompile Distributions

Updated to correctly display the assignment operator <-

/computers/R | permanent link

Sat, 23 Jun 2007

New OpenMPI packages
Debian had OpenMPI package since early last year when Florian Ragwitz made some initial stabs at packaging. The package has seen a number of NMU and patches since then, but was generally getting cobwebs ... which was too bad because OpenMPI seems to have some wind behind its sails upstream. Unfortunately, little of that got packaged for Debian.

After some discussions on and around the debian-science list, a new maintainer group was formed on Alioth under the pkg-openmpi name. Tilman Koschnick (who had already helped Florian with patches), Manuel Prinz, Sylvestre Ledru and myself have gotten things in good enough shape in reasonably short time. And I have just uploaded a lintian-clean package set openmpi_1.2.3-0 to Debian, where it is expected to sit in the NEW queue for a little bit before moving on to the archive proper. The changelog entry (which will appear here eventually) shows twelve bugs closed.

Our plan is to provide a stable and well maintained MPI implementation for Debian. OpenMPI is the designated successor to LAM, and apart from MPICH2, everybody seems to have thrown their weight behind OpenMPI. So we will try to work with the other MPI maintainers to come up with sensible setups, alternatives priorities and the likes. If you are interested in MPI and would like to help, come join us at the Alioth project pkg-openmpi.

Last but not least, thanks to Florian for the initial packaging, and to Clint Adams, Mark Hymers, Andreas Barth, and Steve Langasek (twice even) for NMUs.

/computers/linux/debian/packages | permanent link

Thu, 14 Jun 2007

New York bound
On the second try I got lucky with the lottery for the New York Marathon on November 4 this year. So that may make it Boston, Chicago (on October 7) and now New York for 2007.

/sports/running | permanent link

Madeleine Peyroux at Ravinia
Returning from Toronto, we went straight to Ravinia to see one of my favourite vocalists: Madeleine Peyroux who didn't disappoint. Lovely afternoon / evening as Ravinia is such a treat: summer, lawn, picnic and excellent live music. Let's see if I get out there another time this year.

/music/jazz/live | permanent link

Toronto the good
Went to Toronto last weekend for a wedding. Couldn't have been nicer -- on Toronto Island with its nice view of the city. The ceremony got slightly rescheduled because of the massive storm coming through last Friday afternoon which forced everything inside. Only drawback was that the party ended too soon due to ferry schedule.

Caught up with some friends in town and glanced and a few of those fancy new digs that have come up since we left in 2000: the Opera house, lots of construction at the AGO and the neat building next door, the intriguing chrystal at the ROM. Always nice to come back, especially with the very nicest weather as it was last weekend.

/misc | permanent link

Wed, 30 May 2007

Bike The Drive 2007
Got the whole family out Sunday morning for the annual Bike The Drive in Chicago. The weather looked unfriendly, but behaved as rain stayed away under gray skies (before the day turned into a screamer of a sunny day, very fitting for a holiday). All in all a nice outing, the weather notwithstanding, and we ended the afternoon at a Chicago Fire soccer game.

/sports/cycling | permanent link

Tue, 29 May 2007

JPM Chase Corporate Challenge 2007
Last Thursday was the 2007 edition of the JP Morgan Chase Corp. Challenge. Record crowd of almost 23000 in Chicago's Grant Park. Just like last year, I didn't try hard enough to be closer to the starting line. Which means that one ends up running more or less zig-zag for some time. Oh well, my hand-stopped 22:17 is still pretty close to the PR from two years ago. Overall once again a really nice event, and we managed to have more folks from work show up, both to run, and to have a cold one or two afterwards.

/sports/running | permanent link

Thu, 26 Apr 2007

random 0.1.2
Following yesterday's minor maintenance release of random, a small brown bag fix release just went out to CRAN. Kurt Hornik, diligent as usual, spotted that the vignette would not build, and that has been corrected. The code itself is unchanged.

/computers/linux/debian/packages | permanent link

Wed, 25 Apr 2007

digest 0.3.0
A new version of digest has just been sent to CRAN. Thanks to excellent contributions by Simon Urbanek and Henrik Bengtsson, some internals of the code have been improved. The output produced by digest should now be invariant to R version number changes.

/computers/linux/debian/packages | permanent link

random 0.1.1
A minor maintenance release of random was just uploaded to CRAN. The only change is a correction on regular expression string: R 2.5.0 now warns that percentage signs should not be escaped. Not other code changes were made.

/computers/linux/debian/packages | permanent link

littler 0.0.11
I rolled up 'little r' (pronounced littler) version 0.1.11 earlier today. This release only includes a robustification of the main Makefile (so that e.g. BSDers can build with non-GNU make) and a small fix by Jeff to 'update.r', a handy example script.

As usual, littler can be found in the GoogleCode svn archive, on my r page and in the local directory, and soon on Jeff's littler page at Vanderbilt. The Debian package has been uploaded as well (and has been built again the new R version 2.5.0 that was released yesterday).

/computers/linux/debian/packages | permanent link

Fri, 20 Apr 2007

Boston Marathon 2007
So this Monday was the 111th running, as they call it, of the venerable Boston Marathon, and my first time there. After the fairly serious storm warnings -- and the corresponding health warnings, delivered through two emails from the race organizers and a handout at registration, we got very, very lucky as the rain more or less stopped shortly after the start. And there was relatively little wind, and mostly laterally, rather than the 'in your face' gusts at the projected 30 to 50 mph.

There were actually six of us running from our little informal group in River Forest / Oak Park, and fellow rookie Russ and I decided to 'take it easy' and just 'go for a long run'. I ended up with a fairly even pace between 7:45 and 8:10, averaging 7:57 for a total of 3:28:24, or a few seconds faster than my very first marathon in Chicago. Which is quite pleasant, given the conditions, and the hillier course. The race went pretty well for all six us, which is nice too. Given how we trained together, it is neat how we all ended up within seven minutes of each other.

The race itself is quite stunning. With the required qualifying time comes a 'seeding' system for the start, so one tends to run most of the course with runners of fairly similar speed. That made for nice cameraderie on the course -- and for beautiful sights of nothing but runners rolling through the hills of Massachusetts. I think I'll be back next year.

/sports/running | permanent link

Sun, 18 Mar 2007

2007 March Madness Half Marathon in Cary
Ran the annual March Madness Half Marathon in Cary, IL, today. Really nice morning with temperatures just above freezing, but gloriously sunny and no winds which made for excellent running conditions.

As I somehow managed to leave my Garmin GPS at home, I had to run with little information about pacing and relative speed -- all I got were times announced at about two thirds of the mile markers. So I ended going out somewhat fast, and then working hard not to completely crumble. The end result was rather nice: 1:31:30.4 -- or a pace of 6:59.11 per mile -- and over a minute faster than my previous PR at this distance from last summer, and over four minutes faster than last year's March Madness. Needless to say, I am tired now :)

But I guess it shows that the ongoing training for Boston in four weeks is doing some good. Kudos to Greg for a nice training schedule.

/sports/running | permanent link

Sun, 25 Feb 2007

RQuantLib 0.2.6
Version 0.4.0 of QuantLib was released a few days ago (and Debian packages for QuantLib are waiting to be added by the ftpmaster).

This required some minor changes by Dominick in the Bermudan pricer, and we made some small updates in other place. All in all just a regular maintenance release. The new version 0.2.6 has now been uploaded to both R's master CRAN host and Debian, and is also available locally here.

/computers/linux/debian/packages | permanent link

Thu, 22 Feb 2007

Bug fix release of Finance::YahooQuote
Following up on the patch mentioned earlier, a new bug-fix release 0.22 of Finance::YahooQuote has been uploaded to Debian, CPAN and my yahooquote pages here. I also updated the Freshmeat record.

/computers/linux/debian/packages | permanent link