Wed, 12 Oct 2022

GitHub Streak: Round Nine

Eight years ago I referenced the Seinfeld Streak used in an earlier post of regular updates to to the Rcpp Gallery:

This is sometimes called Jerry Seinfeld’s secret to productivity: Just keep at it. Don’t break the streak.

and then showed the first chart of GitHub streaking 366 days:

github activity october 2013 to october 2014

And seven years ago a first follow-up appeared in this post about 731 days:

github activity october 2014 to october 2015

And six years ago we had a followup at 1096 days

github activity october 2015 to october 2016

And five years ago we had another one marking 1461 days

github activity october 2016 to october 2017

And four years ago another one for 1826 days

github activity october 2017 to october 2018

And three years ago another one bringing it to 2191 days

github activity october 2018 to october 2019

And two years ago another one bringing it to 2557 days

github activity october 2019 to october 2020

And last year another one bringing it to 2922 days

github activity october 2020 to october 2021

And as today is October 12 here is the newest one from 2021 to 2022 one bringing it 3287 days:

github activity october 2021 to october 2022

As always, special thanks go to Alessandro Pezzè for the Chrome add-on GithubOriginalStreak.

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/misc | permanent link

Tue, 12 Oct 2021

GitHub Streak: Round Eight

Seven years ago I referenced the Seinfeld Streak used in an earlier post of regular updates to to the Rcpp Gallery:

This is sometimes called Jerry Seinfeld’s secret to productivity: Just keep at it. Don’t break the streak.

and then showed the first chart of GitHub streaking 366 days:

github activity october 2013 to october 2014

And six years ago a first follow-up appeared in this post about 731 days:

github activity october 2014 to october 2015

And five years ago we had a followup at 1096 days

github activity october 2015 to october 2016

And four years ago we had another one marking 1461 days

github activity october 2016 to october 2017

And three years ago another one for 1826 days

github activity october 2017 to october 2018

And two year another one bringing it to 2191 days

github activity october 2018 to october 2019

And last year another one bringing it to 2257 days

github activity october 2019 to october 2020

And as today is October 12, here is the newest one from 2020 to 2021 with a new total of 2922 days:

github activity october 2020 to october 2021

Again, special thanks go to Alessandro Pezzè for the Chrome add-on GithubOriginalStreak.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/misc | permanent link

Thu, 15 Apr 2021

Announcing ‘Introductions to Emacs Speaks Statistics’

A new website containing introductory videos and slide decks is now available for your perusal at ess-intro.github.io. It provides a series of introductions to the excellent Emacs Speaks Statistics (ESS) mode for the Emacs editor.

This effort started following my little tips, tricks, tools and toys series of short videos and slide decks “for the command-line and R, broadly-speaking”. Which I had mentioned to friends curious about Emacs, and on the ess-help mailing list. And lo and behold, over the fall and winter sixteen of us came together in one GitHub org and are now proud to present the initial batch of videos about first steps, installing, using with spaceemacs, customizing, and org-mode with ESS. More may hopefully fellow, the group is open and you too can join: see the main repo and its wiki.

This is in fact the initial announcement post, so it is flattering that we have already received over 350 views, four comments and twenty-one likes.

We hope it proves to be a useful starting point for some of you. The Emacs editor is quite uniquely powerful, and coupled with ESS makes for a rather nice environment for programming with data, or analysing, visualising, exploring, … data. But we are not zealots: there are many editors and environments under the sun, and most people are perfectly happy with their choice, which is wonderful. We also like ours, and sometimes someone asks ‘tell me more’ or ‘how do I start’. We hope this series satisifies this initial curiousity and takes it from here.

With that, my thanks to Frédéric, Alex, Tyler and Greg for the initial batch, and for everybody else in the org who chipped in with comments and suggestion. We hope it grows from here, so happy Emacsing with R from us!

/computers/misc | permanent link

Mon, 12 Oct 2020

GitHub Streak: Round Seven

Six years ago I referenced the Seinfeld Streak used in an earlier post of regular updates to to the Rcpp Gallery:

This is sometimes called Jerry Seinfeld’s secret to productivity: Just keep at it. Don’t break the streak.

and then showed the first chart of GitHub streaking 366 days:

github activity october 2013 to october 2014

And five years ago a first follow-up appeared in this post about 731 days:

github activity october 2014 to october 2015

And four years ago we had a followup at 1096 days

github activity october 2015 to october 2016

And three years ago we had another one marking 1461 days

github activity october 2016 to october 2017

And two years ago another one for 1826 days

github activity october 2017 to october 2018

And last year another one bringing it to 2191 days

github activity october 2018 to october 2019

And as today is October 12, here is the newest one from 2019 to 2020 with a new total of 2557 days:

github activity october 2018 to october 2019

Again, special thanks go to Alessandro Pezzè for the Chrome add-on GithubOriginalStreak.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/misc | permanent link

Tue, 28 Jul 2020

Installing and Running Ubuntu on a 2015-ish MacBook Air

So a few months ago kiddo one dropped an apparently fairly large cup of coffee onto her one and only trusted computer. With a few months (then) to graduation (which by now happened), and with the apparent “genuis bar” verdict of “it’s a goner” a new one was ordered. As it turns out this supposedly dead one coped well enough with the coffee so that after a few weeks of drying it booted again. But give the newer one, its apparent age and whatnot, it was deemed surplus. So I poked around a little on the interwebs and conclude that yes, this could work.

Fast forward a few months and I finally got hold of it, and had some time to play with it. First, a bootable usbstick was prepared, and the machine’s content was really (really, and check again: really) no longer needed, I got hold of it for good.

tl;dr It works just fine. It is a little heavier than I thought (and isn’t “air” supposed to be weightless?) The ergonomics seem quite nice. The keyboard is decent. Screen-resolution on this pre-retina simple Air is so-so at 1440 pixels. But battery live seems ok and e.g. the camera is way better than what I have in my trusted Lenovo X1 or at my desktop. So just as a zoom client it may make a lot of sense; otherwise just walking around with it as a quick portable machine seems perfect (especially as my Lenovo X1 still (ahem) suffers from one broken key I really need to fix…).

Below are some lightly edited notes from the installation. Initial steps were quick: maybe an hour or less? Customizing a machine takes longer than I remembered, this took a few minutes here and there quite a few times, but always incremental.

Initial Steps

  • Download of Ubuntu 20.04 LTS image: took a few moments, even on broadband, feels slower than normal (fast!) Ubuntu package updates, maybe lesser CDN or bad luck

  • Startup Disk Creator using a so-far unused 8gb usb drive

  • Plug into USB, recycle power, press “Option” on macOS keyboard: voila

  • After a quick hunch… no to ‘live/test only’ and yes to install, whole disk

  • install easy, very few questions, somehow skips wifi

  • so activate wifi manually — and everythings pretty much works

Customization

  • First deal with ‘fn’ and ‘ctrl’ key swap. Install git and followed this github repo which worked just fine. Yay. First (manual) Linux kernel module build needed need in … half a decade? Longer?

  • Fire up firefox, go to ‘download chrome’, install chrome. Sign in. Turn on syncing. Sign into Pushbullet and Momentum.

  • syncthing which is excellent. Initially via apt, later from their PPA. Spend some time remembering how to set up the mutual handshakes between devices. Now syncing desktop/server, lenovo x1 laptop, android phone and this new laptop

  • keepassx via apt and set up using Sync/ folder. Now all (encrypted) passwords synced.

  • Discovered synergy now longer really free, so after a quick search found and installed barrier (via apt) to have one keyboard/mouse from desktop reach laptop.

  • Added emacs via apt, so far ‘empty’, so config files yet

  • Added ssh via apt, need to propagate keys to github and gitlab

  • Added R via add-apt-repository --yes "ppa:marutter/rrutter4.0" and add-apt-repository --yes "ppa:c2d4u.team/c2d4u4.0+". Added littler and then RStudio

  • Added wajig (apt frontend) and byobu, both via apt

  • Created ssh key, shipped it to server and github + gitlab

  • Cloned (not-public) ‘dotfiles’ repo and linked some dotfiles in

  • Cloned git repo for nord-theme for gnome terminal and installed it; also added it to RStudio via this repo

  • Emacs installed, activated dotfiles, then incrementally install a few elpa-* packages and a few M-x package-install including nord-theme, of course

  • Installed JetBrains Mono font from my own local package; activated for Gnome Terminal and Emacs

  • Install gnome-tweak-tool via apt, adjusted a few settings

  • Ran gsettings set org.gnome.desktop.wm.preferences focus-mode 'sloppy'

  • Set up camera following this useful GH repo

  • At some point also added slack and zoom, because, well, it is 2020

  • STILL TODO:

    • docker
    • bother with email setup?,
    • maybe atom/code/…?

/computers/misc | permanent link

Sat, 12 Oct 2019

GitHub Streak: Round Six

Five ago I referenced the Seinfeld Streak used in an earlier post of regular updates to to the Rcpp Gallery:

This is sometimes called Jerry Seinfeld’s secret to productivity: Just keep at it. Don’t break the streak.

and then showed the first chart of GitHub streaking

github activity october 2013 to october 2014
github activity october 2013 to october 2014

And four year ago a first follow-up appeared in this post:

github activity october 2014 to october 2015
github activity october 2014 to october 2015

And three years ago we had a followup

github activity october 2015 to october 2016
github activity october 2015 to october 2016

And two years ago we had another one

github activity october 2016 to october 2017
github activity october 2016 to october 2017

And last year another one

github activity october 2017 to october 2018
github activity october 2017 to october 2018

As today is October 12, here is the newest one from 2018 to 2019:

github activity october 2018 to october 2019
github activity october 2018 to october 2019

Again, special thanks go to Alessandro Pezzè for the Chrome add-on GithubOriginalStreak.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/misc | permanent link

Fri, 12 Oct 2018

GitHub Streak: Round Five

Four years ago I referenced the Seinfeld Streak used in an earlier post of regular updates to to the Rcpp Gallery:

This is sometimes called Jerry Seinfeld’s secret to productivity: Just keep at it. Don’t break the streak.

and then showed the first chart of GitHub streaking

github activity october 2013 to october 2014
github activity october 2013 to october 2014

And three year ago a first follow-up appeared in this post:

github activity october 2014 to october 2015
github activity october 2014 to october 2015

And two years ago we had a followup

github activity october 2015 to october 2016
github activity october 2015 to october 2016

And last year we another one

github activity october 2016 to october 2017
github activity october 2016 to october 2017

As today is October 12, here is the newest one from 2017 to 2018:

github activity october 2017 to october 2018
github activity october 2017 to october 2018

Again, special thanks go to Alessandro Pezzè for the Chrome add-on GithubOriginalStreak.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/misc | permanent link

Thu, 12 Oct 2017

GitHub Streak: Round Four

Three years ago I referenced the Seinfeld Streak used in an earlier post of regular updates to to the Rcpp Gallery:

This is sometimes called Jerry Seinfeld's secret to productivity: Just keep at it. Don't break the streak.

and showed the first chart of GitHub streaking

github activity october 2013 to october 2014

And two year ago a first follow-up appeared in this post:

github activity october 2014 to october 2015

And a year ago we had a followup last year

github activity october 2015 to october 2016

And as it October 12 again, here is the new one:

github activity october 2016 to october 2017

Again, special thanks go to Alessandro Pezzè for the Chrome add-on GithubOriginalStreak.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/misc | permanent link

Wed, 12 Oct 2016

Seinfeld streak at GitHub: Round Three

Two years ago in this post I reference the Seinfeld Streak used in an even earlier post of regular updates to to the Rcpp Gallery:

This is sometimes called Jerry Seinfeld's secret to productivity: Just keep at it. Don't break the streak.

and showed the this first chart of GitHub streaking

github activity october 2013 to october 2014

Last year a follow-up appeared in this post:

github activity october 2014 to october 2015

And as it is that time again, here is this year's version:

github activity october 2015 to october 2016

Special thanks go to Alessandro Pezzè for the Chrome add-on GithubOriginalStreak.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/misc | permanent link

Mon, 12 Oct 2015

Seinfeld streak at GitHub: Round Two

So one year ago, I posted a reference to the Seinfeld Streak as used in an even earlier post of regular updates to to the Rcpp Gallery:

This is sometimes called Jerry Seinfeld's secret to productivity: Just keep at it. Don't break the streak.

Now it is one year later and it seems I just doubled up with a second year of GitHub streaking

github activity october 2014 to october 2015

Maybe someone should send help.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/misc | permanent link

Tue, 08 Sep 2015

It was twenty years ago today...

sgt pepper

Hm, wasn't there a catchy tune that started that way? Just kidding.

But about twenty years ago today I sent this email about a new Debian package upload -- and that makes it (as best as I can tell) the earliest trace of me doing Debian work. It so happened that I did upload two packages in July of 1995 as well, but it was so early in the project that we did not have a mailing list archive for such things yet (!!). And I have a vague recollection that the two in July were an adoption---whereas the post referenced above announced efax as my first new package added to the distribution. And there were more to come ...

Anyway, thanks for all the fish! Twenty years with Debian has been a great, great ride. I learned a lot from a lot of fantastic people, and I hope I helped a few people along the way with a package or two I still maintain.

Debian remains a truly fabulous project which I hope will go strongly for another 20 (or even 22).

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/misc | permanent link

Sun, 21 Dec 2014

If only there was a Romeo somewhere ...

Attention: rant coming. You have been warned, and may want to tune out now.

So the top of my Twitter timeline just had a re-tweet posting to this marvel on the state of Julia. I should have known better than to glance at it as it comes from someone providing (as per the side-bar) Thought leadership in Big Data, systems architecture and more. Reading something like this violates the first rule of airport book stores: never touch anything from the business school section, especially on (wait for it) leadership or, worse yet, thought leadership.

But it is Sunday, my first cup of coffee still warm (after finalising two R package updates on GitHub, and one upload to CRAN) and so I read on. Only to be mildly appalled by the usual comparison to R based on the same old Fibonacci sequence.

Look, I am as guilty as anyone of using it (for example all over chapter one of my Rcpp book), but at least I try to stress each and every time that this is kicking R where it is down as its (fairly) poor performance on functions calls (that is well-known and documented) obviously gets aggravated by recursive calls. But hey, for the record, let me restate those results. So Julia beats R by a factor of 385. But let's take a closer look.

For n=25, I get R to take 241 milliseconds---as opposed to his 6905 milliseconds---simply by using the same function I use in every workshop, eg last used at Penn in November, which does not use the dreaded ifelse operator:

fibR <- function(n) {
  if (n < 2) return(n)
  return(fibR(n-1) + fibR(n-2))
}

Switching that to the standard C++ three-liner using Rcpp

library(Rcpp)
cppFunction('int fibCpp(int n) { 
  if (n < 2) return(n);  
  return(fibCpp(n-1) + fibCpp(n-2));  
  }')

and running a standard benchmark suite gets us the usual result of

R> library(rbenchmark)
R> benchmark(fibR(25),fibCpp(25),order="relative")[,1:4]
        test replications elapsed relative
2 fibCpp(25)          100   0.048    1.000
1   fibR(25)          100  24.674  514.042
R> 

So for the record as we need this later: that is 48 milliseconds for 100 replications, or about 0.48 milliseconds per run.

Now Julia. And of my standard Ubuntu server running the current release 14.10:

edd@max:~$ julia
ERROR: could not open file /home/edd//home/edd//etc/julia/juliarc.jl
 in include at boot.jl:238

edd@max:~$ 

So wait, what? You guys can't even ensure a working release on what is probably the most popular and common Linux installation? And I get to that after reading a post on the importance of "Community, Community, Community" and you can't even make sure this works on Ubuntu? Really?

So a little bit of googling later, I see that julia -f is my friend for this flawed release, and I can try to replicate the original timing

edd@max:~$ julia -f
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.2.1 (2014-02-11 06:30 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-linux-gnu

julia> fib(n) = n < 2 ? n : fib(n - 1) + fib(n - 2)
fib (generic function with 1 method)

julia> @elapsed fib(25)
0.002299559

julia> 

Interestingly the posts author claims 18 milliseconds. I see 2.3 milliseconds here. Maybe someone is having a hard time comparing things to the right of the decimal point. Or maybe his computer is an order of magnitude slower than mine. The more important thing is that Julia is of course faster than R (no surprise: LLVM at work) but also still a lot slower than a (trivial to write and deploy) C++ function. Nothing new here.

So let's recap. Comparison to R was based on a flawed version of a function we only use when we deliberately want to put R down, can be improved significantly when using a better implementation, results are still off by order of magnitude to what was reported ("math is hard"), and the standard C / C++ way of doing things is still several times faster than our new saviour language---which I can't even launch on the current version of one of the more common free operating systems. Ok then. Someone please wake me up in a few years and I will try again.

Now, coming to the end of the rant I should really stress that of course I too hope that Julia succeeds. Every user pulled away from Matlab is a win for all us. We're in this together and the endless navel gazing between ourselves is so tiresome and irrelevant. And as I argue here, even more so when we among ourselves stick to unfair comparisons as well as badly chosen implementation details.

What matters are wins against the likes of Matlab, Excel, SAS and so on. Let's build on our joint strength. I am sure I will use Julia one day, and I am grateful for everyone helping with it---as a lot of help seems to be needed. In the meantime, and with CRAN at 6130 packages that just work I'll continue to make use of this amazing community and trying my bit to help it grow and prosper. As part of our joint community.

/computers/misc | permanent link

Wed, 05 Nov 2014

A Software Carpentry workshop at Northwestern

On Friday October 31, 2014, and Saturday November 1, 2014, around thirty-five graduate students and faculty members attended a Software Carpentry workshop. Attendees came primarily from the Economics department and the Kellogg School of Management, which was also the host and sponsor providing an excellent venue with the Allen Center on the (main) Evanston campus of Northwestern University.

The focus of the two-day workshop was an introduction, and practical initiation, to working effectively at the shell, getting introduced and familiar with the git revision control system, as well as a thorough introduction to working and programming in R---from the basics all the way to advanced graphing as well as creating reproducible research documents.

The idea of the workshop had come out of discussion during our R/Finance 2014 conference. Bob McDonald of Northwestern, one of this year's keynote speakers, and I were discussing various topic related to open source and good programming practice --- as well as the lack of a thorough introduction to either for graduate students and researcher. And once I mentioned and explained Software Carpentry, Bob was rather intrigued. And just a few months later we were hosting a workshop (along with outstanding support from Jackie Milhans from Research Computing at Northwestern).

We were extremely fortunate in that Karthik Ram and Ramnath Vaidyanathan were able to come to Chicago and act as lead instructors, giving me an opportunity to get my feet wet. The workshop began with a session on shell and automation, which was followed by three session focusing on R: a core introduction, a session focused on function, and to end the day, a session on the split-apply-combine approach to data transformation and analysis.

The second day started with two concentrated session on git and the git workflow. In the afternoon, one session on visualization with R as well as a capstone-alike session on reproducible research rounded out the second day.

Things that worked

The experience of the instructors showed, as the material was presented and an effective manner. The selection of topics, as well as the pace were seen by most students to be appropriate and just right. Karthik and Ramnath both did an outstanding job.

No students experienced any real difficulties installing software, or using the supplied information. Participants were roughly split between Windows and OS X laptops, and had generally no problem with bash, git, or R via RStudio.

The overall Software Carpentry setup, the lesson layout, the focus on hands-on exercises along with instruction, the use of the electronic noteboard provided by etherpad and, of course, the tried-and-tested material worked very well.

Things that could have worked better

Even more breaks for exercises could have been added. Students had difficulty staying on pace in some of the exercise: once someone fell behind even for a small error (maybe a typo) it was sometimes hard to catch up. That is a general problem for hands-on classes. I feel I could have done better with the scope of my two session.

Even more cohesion among the topics could have been achieved via a single continually used example dataset and analysis.

Acknowledgments

Robert McDonald from Kellogg, and Jackie Milhans from Research Computing IT, were superb hosts and organizers. Their help in preparing for the workshop was tremendous, and the pick of venue was excellent, and allowed for a stress-free two days of classes. We could not have done this without Karthik and Ramnath, so a very big Thank You to both of them. Last but not least the Software Carpentry 'head office' was always ready to help Bob, Jackie and myself during the earlier planning stage, so another big Thank You! to Greg and Arliss.

/computers/misc | permanent link

Sun, 12 Oct 2014

Seinfeld streak at GitHub

Early last year, I referred to a Seinfeld Streak in a blog post referring to almost two months of updates to the Rcpp Gallery. This is sometimes called Jerry Seinfeld's secret to productivity: Just keep at it. Don't break the streak.

I now have different streak:

github activity october 2013 to october 2014

Now we'll see how far this one will go.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/misc | permanent link

Thu, 23 Sep 2010

R Project and Google Summer of Code: Wrapping up

As this year's admin, I wrote up the following summary which has now been posted at the R site in the appropriate slot. My thanks to this year's students, fellow mentors and everybody else who helped to make it happen.

GSoC 2010 logo

Projects 2010

As in 2008 and 2009, the R Project has again participated in the Google Summer of Code during 2010.

Based on ideas collected and disussed on the R Wiki, the projects and students listed below (and sorted alphabetically by student) were selected for participation and have been sponsored by Google during the summer 2010.

The finished projects are available via the R / GSoC 2010 repository at Google Code, and in several cases also via their individual repos (see below). Informal updates and final summaries on the work was also provided via the GSoC 2010 R group blog.


rdx - Automatic Differentiation in R

Chidambaram Annamalai, mentored by John Nash.

Proposal: radx is a package to compute derivatives (of any order) of native R code for multivariate functions with vector outputs, f:R^m -> R^n, through Automatic Differentiation (AD). Numerical evaluation of derivatives has widespread uses in many fields. rdx will implement two modes for the computation of derivatives, the Forward and Reverse modes of AD, combining which we can efficiently compute Jacobians and Hessians. Higher order derivatives will be evaluated through Univariate Taylor Propagation.

Delivered: Two packages radx: forward automatic differentiation in R and tada: templated automatic differentiation in C++ were created; see this blog post for details.


A GUI for Graphics using ggplot and Deducer

by Ian Fellows, mentored by Hadley Wickham.

Proposal: R puts the latest statistical techniques at one's fingertips through thousands of add-on packages available on the CRAN download servers. The price for all of this power is complexity. Deducer is a cross-platform cross-console graphical user interface built on top of R designed to reduce this complexity. This project proposes to extend the scope of Deducer by creating an innovative yet intuitive system for generating statistical graphics based on the ggplot2 package.

Delivered: All of the major features have been implemented, and are outlined in the video links in this blog post.


rgeos - an R wrapper for GEOS

by Colin Rundel, mentored by Roger Bivand.

Proposal: At present there does not exist a robust geometry engine available to R, the tools that are available tend to be limited in scope and do not easily integrate with existing spatial analysis tools. GEOS is a powerful open source geometry engine written in C++ that implements spatial functions and operators from the OpenGIS Simple Features for SQL specification. rgeos will make these tools available within R and will integrate with existing spatial data tools through the sp package.

Delivered: The rgeos project on R-Forge; see the final update blog post.


Social Relations Analyses in R

by Felix Schoenbrodt, mentored by Stefan Schmukle.

Proposal: Social Relations Analyses (SRAs; Kenny, 1994) are a hot topic both in personality and in social psychology. While more and more research groups adopt the methodology, software solutions are lacking far behind - the main software for calculating SRAs are two DOS programs from 1995, which have a lot of restrictions. My GSOC project will extend the functionality of these existing programs and bring the power of SRAs into the R Environment for Statistical Computing as a state-of-the-art package.

Delivered: The TripleR package is now on CRAN and hosted on RForge.Net; see this blog post for updates.


NoSQL Interface for R

by Yasuhisa Yoshida, mentored by Dirk Eddelbuettel.

Proposal: So-called NoSQL databases are becoming increasingly popular. They generally provide very efficient lookup of key/value pairs. I'll provide several implementation of NoSQL interface for R. Beyond a sample interface package, I'll try to support generic interface similar to what the DBI package does for SQL backends

Status: An initial prototype is available via RTokyoCabinet on Github. No updates were made since June; no communication occurred with anybody related to the GSoC project since June and the project earned a fail.


Last modified: Wed Sep 22 19:39:43 CDT 2010

/computers/misc | permanent link

Mon, 26 Apr 2010

R Project and Google Summer of Code: Welcome to our students!

A few hours ago, I sent the following to both the R development list and the informal R / GSoC list:
Date: Mon, 26 Apr 2010 15:27:29 -0500
To: R Development List 
CC: gsoc-r 
Subject: R and the Google Summer of Code 2010 -- Please welcome our new students!
From: Dirk Eddelbuettel 

Earlier today Google finalised student / mentor pairings and allocations for
the Google Summer of Code 2010 (GSoC 2010).  The R Project is happy to
announce that the following students have been accepted:

   Colin Rundel, "rgeos - an R wrapper for GEOS", mentored by Roger Bivand of
      the Norges Handelshoyskole, Norway

   Ian Fellows, "A GUI for Graphics using ggplot2 and Deducer", mentored by
      Hadley Wickham of Rice University, USA

   Chidambaram Annamalai, "rdx - Automatic Differentiation in R", mentored by
      John Nash of University of Ottawa, Canada

   Yasuhisa Yoshida, "NoSQL interface for R", mentored by Dirk Eddelbuettel,
      Chicago, USA

   Felix Schoenbrodt, "Social Relations Analyses in R", mentored by Stefan
      Schmukle, Universitaet Muenster, Germany

   Details about all proposals are on the R Wiki page for the GSoC 2010 at
   http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010

The R Project is honoured to have received its highest number of student
allocations yet, and looks forward to an exciting Summer of Code.  Please
join me in welcoming our new students.

At this time, I would also like to thank all the other students who have
applied for working with R in this Summer of Code. With a limited number of
available slots, not all proposals can be accepted -- but I hope that those
not lucky enough to have been granted a slot will continue to work with R and
towards making contributions within the R world. 

I would also like to express my thanks to all other mentors who provided for
a record number of proposals.  Without mentors and their project ideas we
would not have a Summer of Code -- so hopefully we will see you again next
year. 

Regards,  

Dirk (acting as R/GSoC 2010 admin)

/computers/misc | permanent link

Thu, 18 Mar 2010

R Project selected for the Google Summer of Code 2010

Earlier today, Google announced the list of accepted mentor organizations for the Google Summer of Code 2010 (GSoC 2010). And we are happy to report that the R Project is once again a participating organization (and now for the third straight year) joining a rather august group of open source projects from around the globe.

An R Wiki page had been created and serves as the central point of reference for the R Project and the GSoC 2010. It contains a list of project ideas, currently counting eleven and spanning everything from research-oriented topics (such as spatial statistics or automatic differentiation) to R community-support (regarding CRAN statistics and the CRANtastic site) to extensions (NoSQL, RPy2 data interfaces, Rserve browser integration) and more. I also just created a mailing list gsoc-r@googlegroups.com where prospective students and mentors can exchange ideas and discuss. As for other details, the Google Summer of Code 2010 site has most of the answers, and we will try to keep R-related information on the aforementioned R Wiki page.

/computers/misc | permanent link

Thu, 22 Oct 2009

From ORD Sessions to R-Forge in 12 hours with RProtoBuf

Yesterday, via in invitation from fellow Chicago-area Google Summer of Code mentor Borja Sotomayor, I attended the Second ORD Sessions. These are happening at the HQ of Inventable where a couple of technologists and Open Source geeks from the Chicagoland area get together and riff on code for a few hours after work over some pizza and beer.

Sounded good, and I needed an excuse to try to mix the awesome Protocol Buffers with my favourite data tool, R. What are Protocol Buffers? To quote from the Google overview page referenced above:

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.
and later on that page:
Protocol buffers are now Google's lingua franca for data – at time of writing, there are 48,162 different message types defined in the Google code tree across 12,183 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.

So three hours later, I had an implementation of the 'addressbook reader' C++ example wrapped in a tiny yet complete R package that passed R CMD check. And one lingua franca for data has met another.

So before going to bed, I quickly registered a new project at R-Forge, everybody's favourite R hosting site, and thanks to the tireless Stefan Theussl (and some favourable timezone differences) the project was approved and the stanza available by the time I got up. So I quickly filled the SVN repo and, presto, we had the RProtoBuf project at R-Forge within 12 hours of the ORD Sessions hackfest. I will try to follow up on RProtoBuf in a couple of days, this may lead to some changes in my Rcpp R / C++ interface package as well.

/computers/misc | permanent link

Sun, 31 May 2009

Ubuntu Developer Summit in Barcelona

Due to some things falling into place, I had an opportunity to attend the first two days of last week's Ubuntu Developer Summit in beautiful Barcelona. Somehow, I had never managed to attend a Debian conference either, so it was good to meet a few of the old Debian hands now moving Ubuntu along, as well as a few of the Ubuntu folks. I also gave a short presentation on R in Debian / Ubuntu and the plans for the upcoming Ubuntu release. More on that another time.

All told, a well-organised conference in a nice setting -- two stone throws from the legendary Camp Nou. Unfortunately, I had to leave by Wednesday so I missed what was undoubtedly quite a scene in Barcelona following Barca's dismantling of Man U in this year's Champions League final.

/computers/misc | permanent link

Sat, 23 May 2009

Temporary Debian mail outage

It would appear that debian.org rejected mail for maybe up to twelve hours from late yesterday afternoon (Central timezone) to some time shortly after I got up this morning. Things appear to be back to normal, so a big Thanks to the mail admins.

If you happened to have sent me mail to my debian.org address during that time period, you may have gotten a hard reject ('550 Administrative prohibition') as did a test mail of mine. In this case mail may not be respooled, so please do send it again.

My alternate address, formed by my first name followed by the family name and the commercial top-level domain, remained functional as a fallback.

/computers/misc | permanent link

Thu, 30 Apr 2009

GSoC 2009 Chicago area meeting

Thanks to the effort of the tireless Borja Sotomayor for the local ACM chpater as well as the good folks at Google in Chicago, we had a local kickoff meeting for this year's Google Summer of Code at Google's Chicago office. A few accepted GSoC students, mentors, and a few Google engineers gave short presentations to a bunch of ACM-affiliated students from U of C, Northwestern, DePaul, IIT, UIC, ... I gave my first ever 'lightning talk' --- on R and the GSoC --- and after chatting a bit more rushed home to catch the end of the amazing triple-overtime win of the Bulls over the Celtics. Go Bulls for the improbable game seven in Boston!

/computers/misc | permanent link

Tue, 28 Apr 2009

Google Summer of Code 2009: R / Quantlib

With everything that has been going on of late, I have yet to mention that Khanh Nguyen, a Ph.D. student in Computer Science at U Mass / Boston, will be working with me on RQuantLib as part of the Google Summer of Code program this year.

We had twenty-two applications to review for the R project, including three for the RQuantLib topic I had proposed. Khanh's application was clearly among the best, and I look forward to helping him do cool stuff over the summer. He already posted two short emails on the r-sig-finance and the quantlib-user lists soliciting suggestions and comments. So if you have comments regarding R and QuantLib, please get in touch with him or me!

/computers/misc | permanent link

Wed, 07 Jan 2009

Google Summer of Code 2009

Word is out that there will be a 2009 edition of the Google Summer of Code. I have some follow-up ideas based on last year's mentoring for both Debian and R, but there will be a better time and place to discuss possible project ideas.

/computers/misc | permanent link

Tue, 28 Oct 2008

Google Summer of Code 2008 Mentors Summit

Spent last weekend in Mountain View where Google had invited a number of mentors for the Summer of Code project that Google once again graciously sponsored. A rather impressive list of projects sent up to two people each, giving a probably unparalled sample of major Open Source projects.

I had a blast. Chris, Leslie and the rest of the Google's Open Source Programs Office facilitated a really nice unconference that spawned a few really nice sessions, and they took very good care of us. And just about everybody met a number of folks in person that were previously known only via email or irc. As the saying goes: nothing like the bandwidth of a face-to-face meeting...

Last but not least I should issue a health warning. Sharing a room with the fearless Debian DPL is not for the faint of heart: His snooring is truly world-class.

/computers/misc | permanent link

Tue, 14 Oct 2008

RPostgreSQL 0.1.0

As part of the Google Summer of Code program for 2008 (which I mentioned here and here), Sameer and I are happy to announce that RPostgreSQL is now on the CRAN mirror network for R. RPostgreSQL provides a (DBI-compliant) interface between R and the Postgresql database system. I also just sent this short announcement to the r-packages list.

/computers/misc | permanent link

Fri, 25 Apr 2008

Google Summer of Code 2008 projects assigned

As mentioned a few weeks ago, I had submitted two entries as possible topics for this year's version of the Google Summer of Code. And lo and behold, the response was very, very good and both applications were slotted!

For the topic 'create a PostgreSQL package for R that uses the standard DBI interface', a number of interested students contacted me, and a total of three applications were submitted. And while the R Foundation was only able to allocate four topics among a number of really good applications, Sameer Prayaga was our pick for this topic. It would be nice to fill this gap among the existing database connection methods for R, and I feel that Sameer can pull this off.

For the second topic of 'create a cran2deb tool for converting CRAN sources into Debian package' which I had submitted within Debian, Charles Blundell wrote an excellent application. In a way, this topic is a '2.0' version of our previous attempts of a 'top-down' set of tools in the pkg-bioc project on Alioth. This time, we will try something smaller, maybe more modular and lighter and see how far we get there if we try it 'bottom-up'.

And as we are currently in the community bonding phase, say Hi to Sameer or Charles when you come across them these days.

Lastly, I'd like to thank everybody who submitted an entry at Debian or R, or who contacted me about one of the topics I posted. The respone was very humbling, many of you were imminently qualified and seemingly very motivated -- but even Google's pockets can only pay for a finite number of projects. Sorry if yours did not get picked.

/computers/misc | permanent link

Mon, 17 Mar 2008

Google Summer of Code 2008 projects are up

For the past few years, Google has been running their Summer of Code events where Google offers $5000 for college students who are asked to code for a summer for the benefit of various Open Source efforts.

And just like in 2006 and 2007, I put up proposals and offered to act as a mentor. The first one is up at both Debian and R: an opportunity to help with the ongoing efforts of 'turning more CRAN package into Debian packages'. The second one is only at the R page: a proposal to fill the missing link of DBI database interface modules with a matching one for PostgreSQL. More details for either idea are at the respective pages. Anybody interested should ping me by email.

/computers/misc | permanent link

Sat, 21 Jul 2007

Dead disks, and lvm woes

As posted earlier on the rather recently announced CRANberries feed (that is generated and hosted on my box) covering CRAN package updates, my main server bit the dust. The hard disk no longer wants to talk to any of its lvm partitions, leaving me without /usr, /var/, /var/local/, /home, /srv. Fortunately, most of this was backed-up but now I have to go through reconfiguring the replacement machine I already set up. Not fun.

If anybody has tips on recovering the lvm partitions, I'm all ears. /etc/lvm/ seems fine if that helps as a starting point.

/computers/misc | permanent link

Sat, 29 Jul 2006

Patch to build palm-db-tools_0.3.6 with recent g++ versions

A freshly downloaded version of palm-db-tools (using the most recent version 0.3.6 from June 2003) will not compile with g++ 4.0.3 and its pickier interpretation of C++.

A simple patch which removes trailing semicolons after class definitions, and adds virtual destructors for a few classes, is below. The mailing list archives show one other patch for a different (yet probably relevant :) problem, and its from November 2003.

By the way, would be nice if someone added the package to Debian. The Makefile's make clean target isn't quite right, and make install isn't exactly correct either. Plus, I seem to be building this by hand every few years, and it's so much easier to just apt-get. And no, my plate is already too full.

diff -ru palm-db-tools-0.3.6.orig/libpalm/Block.h palm-db-tools-0.3.6/libpalm/Block.h
--- palm-db-tools-0.3.6.orig/libpalm/Block.h	2003-06-19 18:37:47.000000000 -0500
+++ palm-db-tools-0.3.6/libpalm/Block.h	2006-07-29 15:04:19.000000000 -0500
@@ -176,7 +176,7 @@
 	size_type m_size;
     };
 
-};
+}
 
 bool operator == (const PalmLib::Block& lhs, const PalmLib::Block& rhs);
 
diff -ru palm-db-tools-0.3.6.orig/libpalm/File.h palm-db-tools-0.3.6/libpalm/File.h
--- palm-db-tools-0.3.6.orig/libpalm/File.h	2003-06-19 18:37:47.000000000 -0500
+++ palm-db-tools-0.3.6/libpalm/File.h	2006-07-29 15:04:42.000000000 -0500
@@ -89,6 +89,6 @@
 	uid_map_t m_uid_map;
     };
 
-};
+}
 
 #endif
diff -ru palm-db-tools-0.3.6.orig/libsupport/infofile.h palm-db-tools-0.3.6/libsupport/infofile.h
--- palm-db-tools-0.3.6.orig/libsupport/infofile.h	2003-06-19 18:37:47.000000000 -0500
+++ palm-db-tools-0.3.6/libsupport/infofile.h	2006-07-29 15:07:08.000000000 -0500
@@ -33,6 +33,7 @@
         {
         public:
             virtual void parse(int linenum, std::vector< std::string> array) = 0;
+	  virtual ~Parser() {};
         };
         class ConfigParser: public Parser
         {
@@ -40,6 +41,7 @@
                 ConfigParser(DataFile::CSVConfig& state)
                     : m_Config(state)
                     {}
+	        virtual ~ConfigParser() {};
                 virtual void parse(int linenum, std::vector< std::string> array);
             private:
                 DataFile::CSVConfig& m_Config;
@@ -50,6 +52,7 @@
                 DatabaseParser(PalmLib::FlatFile::Database& db)
                     : m_DB(db)
                     {}
+	        virtual ~DatabaseParser() {};
                 virtual void parse(int linenum, std::vector< std::string> array);
             private:
                 PalmLib::FlatFile::Database& m_DB;
@@ -60,6 +63,7 @@
                 TypeParser(std::string& p_Type)
                     : m_Type(p_Type)
                     {}
+	        virtual ~TypeParser() {};
                 virtual void parse(int linenum, std::vector< std::string> array);
             private:
                 std::string& m_Type;

/computers/misc | permanent link

Thu, 20 Jul 2006

Undeleting from ext3

Kudos to Paul Wise who suggested late last night to try magicrescue to recover an OpenOffice file with meeting notes I had mistakenly deleted from the laptop before I even got to editing these very notes, or backing them up.

And it worked -- despite the dire warnings from the ext3 FAQ about the near impossibility of undeleting from ext3 partitions. But a few things worked in my favour here: OpenOffice files are zip files 'under the hood', and magicrescue knows how to deal with certain file formats via its collection of 'recipes' (which one can extend), and zip is among the shipped recipes. Also, I still had the 'ls -l' session in one konsole tab so I knew the exact file size I was looking for. And I wasn't in a hurry. So this morning, after magicrescue had faithfully restored 4479 files in over four hours, its companion magicsort then filtered these into a handful of directories, one of which was 'OpenDocument Text' -- which contained a few files, including two versions of the desired file as well as copies of the version from the previous meeting. Nice. And OpenOffice gladly opened the file. Lastly, it cetainly helped that I happened to have a lurking debian-devel irc session open where --- where Paul not only caught my remark, but almost immediately suggested to try magicrescue. So here goes a well-earned Thank You to Paul!

/computers/misc | permanent link

Wed, 30 Nov 2005

Mail broken

Comcast, in their infinite wisdom, decided to RBL block (at least one of) Debian's mailhost(s), master.debian.org. As essentially all my mail is forwarded from Debian to Comcast, mail traffic was affected.

At first, I noticed dropped posts from mailing list subscriptions relative to the list web archives. I posted the suggested noticed to the Comcast RBL admins, so far to no avail other than a to yield a more complete stop. The mail forwarding has been disabled for now, and I am dealing with this more manually.

If you tried to contact me in the last few days and have not gotten a response, please resend your mail. I should now be in a position to receive it. Sorry for any inconveniences.

Offers of better mail hosting would be gladly accepted, of course.

/computers/misc | permanent link

Fri, 04 Nov 2005

Smokes, that ain't pretty

From the headers of a mail I just got:
Received: from master.debian.org ([146.82.138.7])
   by sccrmxc11.comcast.net (sccrmxc11) with ESMTP
   id <20051105044942s1100j9crge>; Sat, 5 Nov 2005 04:49:42 +0000
X-Originating-IP: [146.82.138.7]
Received: from wproxy.gmail.com [64.233.184.204]
   by master.debian.org with esmtp (Exim 3.35 1 (Debian))
   id 1E7EfR-0004zq-00; Mon, 22 Aug 2005 10:56:58 -0500
Received: by wproxy.gmail.com with SMTP id i34so340646wra
   for <edd©debian.org>; Mon, 22 Aug 2005 08:56:47 -0700 (PDT)
Translation for non-propellerheads: Someone with a Google Gmail account emailed me on August 22. It took eleven seconds to get from Gmail to Debian (once you consider the different timezones), but Debian's mail system took until today, November 5, to deliver it to Comcast, my ISP.Yikes.

So if you, anonymous reader, happened to have emailed me lately and are still awaiting a reply ... please consider the possibility that I may not have received your mail yet.

I should add that the underlying problem has been rectified at Debian's end as one of our hard-working admins stated yesterday.

/computers/misc | permanent link

Thu, 16 Sep 2004

Quite right

The widely-read and rather influential Walt Mossberg was reviewing Windows security in today's column and concluded
Bottom line: If you use Windows, you're asking for trouble.

As they say, truer words have never been spoken... Other than recommending OS X in no uncertain terms, he listed a host of required add-ons to make that one dominant OS cope with its own lack of security.

/computers/misc | permanent link

Wed, 15 Sep 2004

Compiling afio under Cywgin

Here is a simple patch against the current afio-2.5 sources, derived mostly from an older patch from Japan against 2.4.7. This allows compilation of afio under Cygwin by simply calling make, no further config needed.

I sent it upstream to Koen, maybe it'll show up in a future versiom.

 diff -ru afio-2.5.orig/afio.c afio-2.5/afio.c ---
afio-2.5.orig/afio.c 2003-12-20 16:16:13.000000000 -0600 +++ afio-2.5/afio.c
2004-09-13 17:12:50.548515800 -0500 @@ -184,7 +184,11 @@
 #include 
 #include 
 #include 
+#ifdef __CYGWIN32__
+#include 
+#else
 #include 
+#endif
 #include 
 #include 
 #include 
Only in afio-2.5: afio.exe
diff -ru afio-2.5.orig/afio.h afio-2.5/afio.h
--- afio-2.5.orig/afio.h	2003-12-20 07:59:42.000000000 -0600
+++ afio-2.5/afio.h	2004-09-13 15:17:51.700744200 -0500
@@ -477,7 +477,7 @@
 #ifndef MKDIR
 int rmdir (char *);
 #endif
-#if !defined (linux) && !defined(__FreeBSD__) && !defined(sun)
+#if !defined (linux) && !defined(__FreeBSD__) && !defined(sun) && !defined(__CYGWIN32__)
 VOIDFN (*signal ())();
 #endif
      int fswrite (int, char*, uint);
diff -ru afio-2.5.orig/compfile.c afio-2.5/compfile.c
--- afio-2.5.orig/compfile.c	2003-06-24 16:32:20.000000000 -0500
+++ afio-2.5/compfile.c	2004-09-13 15:03:44.532576200 -0500
@@ -210,7 +210,7 @@
  * version;
  */
 
-#if ( defined(sun) && defined(__svr4__) )
+#if ( defined(sun) && defined(__svr4__) ) || defined(__CYGWIN32__)
 #include 
 #else
 #include 



/computers/misc | permanent link

Mon, 12 Jan 2004

That was fast

Ordered a new box to replace our dual-amd, with which we continue to have random locks every couple of days, on Wednesday evening just before one of these Dell special deals expired -- it's their entry level server which has a nice Unofficial Dell PowerEdge 400SC FAQ.

Shipment was supposedly to be in three days from now ... yet the box arrived today! It boots Knoppix as well as my Quantian just fine. A pIV 2.8 GHz and a mobo with sata support, gigabit lan, graphics, sounds, whathaveyou all integrated along with a puny little disk and a laughable amount of Ram (order for a gb from crucial is on its way), it all came to just over $500 of which $100 should come back via a mail-in rebate.

/computers/misc | permanent link

Sat, 03 May 2003

ATX power supplies suck

Grr. Power went out on Thursday, and one of our computers didn't come back on. Took two days to figure that the stupid connector from the power switch to the motherboard was loose so that the power supply was able to operate. Is it just me, or does anybody else think that AT power supplied were simpler? Power on whne pressed, and off otherwise. Clean. Simple.

/computers/misc | permanent link

Fri, 11 Apr 2003

Compiling a2ps under Cywgin and adding R / S support

For those of use with fingers and brains hardwired to Unix, Cygwin is a godsend. One package I was always missing from Cygwin is a2ps -- which is really nice for pretty-printing just about anything, which would include source code. Thanks to a rather useful support 'style sheet' for the S language (i.e. GNU R and its commercial sibbling S-Plus), you can typeset S code as well. The patch is once again part of the Debian package and can also be found on CRAN.

I simply grabbed the most recent Debian tarball and diff.gz, unpacked the tarball and applied the patch. You then need to run the appropriate debian/patches patch from the a2ps sources directory as per

sh debian/patches/10_s_support.dpatch -patch

After that, it's just a matter of adding a two-line patch in /usr/include/string.h to comment out two lines

#ifndef __CYGWIN__
_PTR	 _EXFUN(memccpy,(_PTR, const _PTR, int, size_t));
_PTR	 _EXFUN(mempcpy,(_PTR, const _PTR, size_t));
#endif
and
configure --medium=letter; make; make install
builds and install a shiny new a2ps for Cygwin.

/computers/misc | permanent link