Sun, 21 Jan 2018

#15: Tidyverse and data.table, sitting side by side ... (Part 1)

Welcome to the fifteenth post in the rarely rational R rambling series, or R4 for short. There are two posts I have been meaning to get out for a bit, and hope to get to shortly---but in the meantime we are going start something else.

Another longer-running idea I had was to present some simple application cases with (one or more) side-by-side code comparisons. Why? Well at times it feels like R, and the R community, are being split. You're either with one (increasingly "religious" in their defense of their deemed-superior approach) side, or the other. And that is of course utter nonsense. It's all R after all.

Programming, just like other fields using engineering methods and thinking, is about making choices, and trading off between certain aspects. A simple example is the fairly well-known trade-off between memory use and speed: think e.g. of a hash map allowing for faster lookup at the cost of some more memory. Generally speaking, solutions are rarely limited to just one way, or just one approach. So if pays off to know your tools, and choose wisely among all available options. Having choices is having options, and those tend to have non-negative premiums to take advantage off. Locking yourself into one and just one paradigm can never be better.

In that spirit, I want to (eventually) show a few simple comparisons of code being done two distinct ways.

One obvious first candidate for this is the gunsales repository with some R code which backs an earlier NY Times article. I got involved for a similar reason, and updated the code from its initial form. Then again, this project also helped motivate what we did later with the x13binary package which permits automated installation of the X13-ARIMA-SEATS binary to support Christoph's excellent seasonal CRAN package (and website) for which we now have a forthcoming JSS paper. But the actual code example is not that interesting / a bit further off the mainstream because of the more specialised seasonal ARIMA modeling.

But then this week I found a much simpler and shorter example, and quickly converted its code. The code comes from the inaugural datascience 1 lesson at the Crosstab, a fabulous site by G. Elliot Morris (who may be the highest-energy undergrad I have come across lately) focusssed on political polling, forecasts, and election outcomes. Lesson 1 is a simple introduction, and averages some polls of the 2016 US Presidential Election.

Complete Code using Approach "TV"

Elliot does a fine job walking the reader through his code so I will be brief and simply quote it in one piece:


## Getting the polls

library(readr)
polls_2016 <- read_tsv(url("http://elections.huffingtonpost.com/pollster/api/v2/questions/16-US-Pres-GE%20TrumpvClinton/poll-responses-clean.tsv"))

## Wrangling the polls

library(dplyr)
polls_2016 <- polls_2016 %>%
    filter(sample_subpopulation %in% c("Adults","Likely Voters","Registered Voters"))
library(lubridate)
polls_2016 <- polls_2016 %>%
    mutate(end_date = ymd(end_date))
polls_2016 <- polls_2016 %>%
    right_join(data.frame(end_date = seq.Date(min(polls_2016$end_date),
                                              max(polls_2016$end_date), by="days")))

## Average the polls

polls_2016 <- polls_2016 %>%
    group_by(end_date) %>%
    summarise(Clinton = mean(Clinton),
              Trump = mean(Trump))

library(zoo)
rolling_average <- polls_2016 %>%
    mutate(Clinton.Margin = Clinton-Trump,
           Clinton.Avg =  rollapply(Clinton.Margin,width=14,
                                    FUN=function(x){mean(x, na.rm=TRUE)},
                                    by=1, partial=TRUE, fill=NA, align="right"))

library(ggplot2)
ggplot(rolling_average)+
  geom_line(aes(x=end_date,y=Clinton.Avg),col="blue") +
  geom_point(aes(x=end_date,y=Clinton.Margin))

It uses five packages to i) read some data off them interwebs, ii) then filters / subsets / modifies it leading to a right (outer) join with itself before iv) averaging per-day polls first and then creates rolling averages over 14 days before v) plotting. Several standard verbs are used: filter(), mutate(), right_join(), group_by(), and summarise(). One non-verse function is rollapply() which comes from zoo, a popular package for time-series data.

Complete Code using Approach "DT"

As I will show below, we can do the same with fewer packages as data.table covers the reading, slicing/dicing and time conversion. We still need zoo for its rollapply() and of course the same plotting code:


## Getting the polls

library(data.table)
pollsDT <- fread("http://elections.huffingtonpost.com/pollster/api/v2/questions/16-US-Pres-GE%20TrumpvClinton/poll-responses-clean.tsv")

## Wrangling the polls

pollsDT <- pollsDT[sample_subpopulation %in% c("Adults","Likely Voters","Registered Voters"), ]
pollsDT[, end_date := as.IDate(end_date)]
pollsDT <- pollsDT[ data.table(end_date = seq(min(pollsDT[,end_date]),
                                              max(pollsDT[,end_date]), by="days")), on="end_date"]

## Average the polls

library(zoo)
pollsDT <- pollsDT[, .(Clinton=mean(Clinton), Trump=mean(Trump)), by=end_date]
pollsDT[, Clinton.Margin := Clinton-Trump]
pollsDT[, Clinton.Avg := rollapply(Clinton.Margin, width=14,
                                   FUN=function(x){mean(x, na.rm=TRUE)},
                                   by=1, partial=TRUE, fill=NA, align="right")]

library(ggplot2)
ggplot(pollsDT) +
    geom_line(aes(x=end_date,y=Clinton.Avg),col="blue") +
    geom_point(aes(x=end_date,y=Clinton.Margin))

This uses several of the components of data.table which are often called [i, j, by=...]. Row are selected (i), columns are either modified (via := assignment) or summarised (via =), and grouping is undertaken by by=.... The outer join is done by having a data.table object indexed by another, and is pretty standard too. That allows us to do all transformations in three lines. We then create per-day average by grouping by day, compute the margin and construct its rolling average as before. The resulting chart is, unsurprisingly, the same.

Benchmark Reading

We can looking how the two approaches do on getting data read into our session. For simplicity, we will read a local file to keep the (fixed) download aspect out of it:

R> url <- "http://elections.huffingtonpost.com/pollster/api/v2/questions/16-US-Pres-GE%20TrumpvClinton/poll-responses-clean.tsv"
R> download.file(url, destfile=file, quiet=TRUE)
R> file <- "/tmp/poll-responses-clean.tsv"
R> res <- microbenchmark(tidy=suppressMessages(readr::read_tsv(file)),
+                       dt=data.table::fread(file, showProgress=FALSE))
R> res
Unit: milliseconds
 expr     min      lq    mean  median      uq      max neval
 tidy 6.67777 6.83458 7.13434 6.98484 7.25831  9.27452   100
   dt 1.98890 2.04457 2.37916 2.08261 2.14040 28.86885   100
R> 

That is a clear relative difference, though the absolute amount of time is not that relevant for such a small (demo) dataset.

Benchmark Processing

We can also look at the processing part:

R> rdin <- suppressMessages(readr::read_tsv(file))
R> dtin <- data.table::fread(file, showProgress=FALSE)
R> 
R> library(dplyr)
R> library(lubridate)
R> library(zoo)
R> 
R> transformTV <- function(polls_2016=rdin) {
+     polls_2016 <- polls_2016 %>%
+         filter(sample_subpopulation %in% c("Adults","Likely Voters","Registered Voters"))
+     polls_2016 <- polls_2016 %>%
+         mutate(end_date = ymd(end_date))
+     polls_2016 <- polls_2016 %>%
+         right_join(data.frame(end_date = seq.Date(min(polls_2016$end_date), 
+                                                   max(polls_2016$end_date), by="days")))
+     polls_2016 <- polls_2016 %>%
+         group_by(end_date) %>%
+         summarise(Clinton = mean(Clinton),
+                   Trump = mean(Trump))
+ 
+     rolling_average <- polls_2016 %>%
+         mutate(Clinton.Margin = Clinton-Trump,
+                Clinton.Avg =  rollapply(Clinton.Margin,width=14,
+                                         FUN=function(x){mean(x, na.rm=TRUE)}, 
+                                         by=1, partial=TRUE, fill=NA, align="right"))
+ }
R> 
R> transformDT <- function(dtin) {
+     pollsDT <- copy(dtin) ## extra work to protect from reference semantics for benchmark
+     pollsDT <- pollsDT[sample_subpopulation %in% c("Adults","Likely Voters","Registered Voters"), ]
+     pollsDT[, end_date := as.IDate(end_date)]
+     pollsDT <- pollsDT[ data.table(end_date = seq(min(pollsDT[,end_date]), 
+                                                   max(pollsDT[,end_date]), by="days")), on="end_date"]
+     pollsDT <- pollsDT[, .(Clinton=mean(Clinton), Trump=mean(Trump)), 
+                        by=end_date][, Clinton.Margin := Clinton-Trump]
+     pollsDT[, Clinton.Avg := rollapply(Clinton.Margin, width=14,
+                                        FUN=function(x){mean(x, na.rm=TRUE)}, 
+                                        by=1, partial=TRUE, fill=NA, align="right")]
+ }
R> 
R> res <- microbenchmark(tidy=suppressMessages(transformTV(rdin)),
+                       dt=transformDT(dtin))
R> res
Unit: milliseconds
 expr      min       lq     mean   median       uq      max neval
 tidy 12.54723 13.18643 15.29676 13.73418 14.71008 104.5754   100
   dt  7.66842  8.02404  8.60915  8.29984  8.72071  17.7818   100
R> 

Not quite a factor of two on the small data set, but again a clear advantage. data.table has a reputation for doing really well for large datasets; here we see that it is also faster for small datasets.

Side-by-side

Stripping the reading, as well as the plotting both of which are about the same, we can compare the essential data operations.

Summary

We found a simple task solved using code and packages from an increasingly popular sub-culture within R, and contrasted it with a second approach. We find the second approach to i) have fewer dependencies, ii) less code, and iii) running faster.

Now, undoubtedly the former approach will have its staunch defenders (and that is all good and well, after all choice is good and even thirty years later some still debate vi versus emacs endlessly) but I thought it to be instructive to at least to be able to make an informed comparison.

Acknowledgements

My thanks to G. Elliot Morris for a fine example, and of course a fine blog and (if somewhat hyperactive) Twitter account.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sat, 20 Jan 2018

Rcpp 0.12.15: Numerous tweaks and enhancements

The fifteenth release in the 0.12.* series of Rcpp landed on CRAN today after just a few days of gestation in incoming/.

This release follows the 0.12.0 release from July 2016, the 0.12.1 release in September 2016, the 0.12.2 release in November 2016, the 0.12.3 release in January 2017, the 0.12.4 release in March 2016, the 0.12.5 release in May 2016, the 0.12.6 release in July 2016, the 0.12.7 release in September 2016, the 0.12.8 release in November 2016, the 0.12.9 release in January 2017, the 0.12.10.release in March 2017, the 0.12.11.release in May 2017, the 0.12.12 release in July 2017, the 0.12.13.release in late September 2017, and the 0.12.14.release in November 2017 making it the nineteenth release at the steady and predictable bi-montly release frequency.

Rcpp has become the most popular way of enhancing GNU R with C or C++ code. As of today, 1288 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with another 91 in BioConductor.

This release contains a pretty large number of pull requests by a wide variety of authors. Most of these pull requests are very focused on a particular issue at hand. One was larger and ambitious with some forward-looking code for R 3.5.0; however this backfired a little on Windows and is currently "parked" behind a #define. Full details are below.

Changes in Rcpp version 0.12.15 (2018-01-16)

  • Changes in Rcpp API:

    • Calls from exception handling to Rf_warning() now correctly set an initial format string (Dirk in #777 fixing #776).

    • The 'new' Date and Datetime vectors now have is_na methods too. (Dirk in #783 fixing #781).

    • Protect more temporary SEXP objects produced by wrap (Kevin in #784).

    • Use public R APIs for new_env (Kevin in #785).

    • Evaluation of R code is now safer when compiled against R 3.5 (you also need to explicitly define RCPP_PROTECTED_EVAL before including Rcpp.h). Longjumps of all kinds (condition catching, returns, restarts, debugger exit) are appropriately detected and handled, e.g. the C++ stack unwinds correctly (Lionel in #789). [ Committed but subsequently disabled in release 0.12.15 ]

    • The new function Rcpp_fast_eval() can be used for performance-sensitive evaluation of R code. Unlike Rcpp_eval(), it does not try to catch errors with tryEval in order to avoid the catching overhead. While this is safe thanks to the stack unwinding protection, this also means that R errors are not transformed to an Rcpp::exception. If you are relying on error rethrowing, you have to use the slower Rcpp_eval(). On old R versions Rcpp_fast_eval() falls back to Rcpp_eval() so it is safe to use against any versions of R (Lionel in #789). [ Committed but subsequently disabled in release 0.12.15 ]

    • Overly-clever checks for NA have been removed (Kevin in #790).

    • The included tinyformat has been updated to the current version, Rcpp-specific changes are now more isolated (Kirill in #791).

    • Overly picky fall-through warnings by gcc-7 regarding switch statements are now pre-empted (Kirill in #792).

    • Permit compilation on ANDROID (Kenny Bell in #796).

    • Improve support for NVCC, the CUDA compiler (Iñaki Ucar in #798 addressing #797).

    • Speed up tests for NA and NaN (Kirill and Dirk in #799 and #800).

    • Rearrange stack unwind test code, keep test disabled for now (Lionel in #801).

    • Further condition away protect unwind behind #define (Dirk in #802).

  • Changes in Rcpp Attributes:

    • Addressed a missing Rcpp namespace prefix when generating a C++ interface (James Balamuta in #779).
  • Changes in Rcpp Documentation:

    • The Rcpp FAQ now shows Rcpp::Rcpp.plugin.maker() and not the outdated ::: use applicable non-exported functions.

Thanks to CRANberries, you can also look at a diff to the previous release. As always, details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads page, the browseable doxygen docs and zip files of doxygen output for the standard formats. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Tue, 16 Jan 2018

RcppMsgPack 0.2.1

Am update of RcppMsgPack got onto CRAN today. It contains a number of enhancements Travers had been working on, as well as one thing CRAN asked us to do in making a suggested package optional.

MessagePack itself is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it is faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves. RcppMsgPack brings both the C++ headers of MessagePack as well as clever code (in both R and C++) Travers wrote to access MsgPack-encoded objects directly from R.

Changes in version 0.2.1 (2018-01-15)

  • Some corrections and update to DESCRIPTION, README.md, msgpack.org.md and vignette (#6).

  • Update to c_pack.cpp and tests (#7).

  • More efficient packing of vectors (#8).

  • Support for timestamps and NAs (#9).

  • Conditional use of microbenchmark in tests/ as required for Suggests: package [CRAN request] (#10).

  • Minor polish to tests relaxing comparison of timestamp, and avoiding a few g++ warnings (#12 addressing #11).

Courtesy of CRANberries, there is also a diffstat report for this release. More information is on the RcppRedis page.

More information may be on the RcppMsgPack page. Issues and bugreports should go to the GitHub issue tracker.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Sun, 14 Jan 2018

digest 0.6.14

Another small maintenance release, version 0.6.14, of the digest package arrived on CRAN and in Debian today.

digest creates hash digests of arbitrary R objects (using the 'md5', 'sha-1', 'sha-256', 'crc32', 'xxhash' and 'murmurhash' algorithms) permitting easy comparison of R language objects.

Just like release 0.6.13 a few weeks ago, this release accomodates another request by Luke and Tomas and changes two uses of NAMED to MAYBE_REFERENCED which helps in the transition to the new reference counting model in R-devel. Thierry also spotted a minor wart in how sha1() tested type for matrices and corrected that, and I converted a few references to https URLs and correct one now-dead URL.

CRANberries provides the usual summary of changes to the previous version.

For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/digest | permanent link

Sun, 07 Jan 2018

prrd 0.0.1: Parallel Running [of] Reverse Depends

A new package is now on the ghrr drat. It was uploaded four days ago to CRAN but still lingers in the inspect/ state, along with a growing number of other packages. But as some new packages have come through, I am sure it will get processed eventually but in the meantime I figured I may as well make it available this way.

The idea of prrd is simple, and described in some more detail on its webpage and its GitHub repo. Reverse dependency checks are an important part of package development (provided you care about not breaking other packages as CRAN asks you too), and is easily done in a (serial) loop. But these checks are also generally embarassingly parallel as there is no or little interdependency between them (besides maybe shared build depedencies).

So this package uses the liteq package by Gabor Csardi to set up all tests to run as tasks in a queue. This permits multiple independent queue runners to work at a task at a time. Results are written back and summarized.

This already works pretty well as evidenced by the following screenshot (running six parallel workers, arranged in split byobu session).

See the aforementioned webpage and its repo for more details, and by all means give it a whirl.

For more questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/prrd | permanent link

Sat, 06 Jan 2018

tint 0.0.5

A maintenance release of the tint package arrived on CRAN earlier this week. Its name expands from tint is not tufte as the package offers a fresher take on the Tufte-style for html and pdf presentations.

A screenshot of the pdf variant is below.

Similar to the RcppCNPy release this week, this is pure maintenance related to dependencies. CRAN noticed that processing these vignettes requires the mgcv package---as we use geom_smooth() in some example graphs. So that was altered to not require this requirement just for the vignette tests. We also had one pending older change related to jurassic pandoc versions on some CRAN architectures.

Changes in tint version 0.0.5 (2018-01-05)

  • Only run html rendering regression test on Linux or Windows as the pandoc versions on CRAN are too old elsewhere.

  • Vignette figures reworked so that the mgcv package is not required avoiding a spurious dependency [CRAN request]

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the tint page.

For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/tint | permanent link

RcppCNPy 0.2.8

A minor maintenance release of the RcppCNPy package arrived on CRAN this week.

RcppCNPy provides R with read and write access to NumPy files thanks to the cnpy library by Carl Rogers.

There is no code change here. But to process the vignette we rely on knitr which sees Python here and (as of its most recent release) wants the (excellent !!) reticulate package. Which is of course overkill just to process a short pdf document, so we turned this off.

Changes in version 0.2.8 (2018-01-04)

  • Vignette sets knitr option python.reticulate=FALSE to avoid another depedency just for the vignette [CRAN request]

CRANberries also provides a diffstat report for the latest release. As always, feedback is welcome and the best place to start a discussion may be the GitHub issue tickets page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Fri, 22 Dec 2017

#14: Finding Binary .deb Files for CRAN Packages

Welcome to the fourteenth post in the rationally rambling R rants series, or R4 for short. The last two posts were concerned with faster installation. First, we showed how ccache can speed up (re-)installation. This was followed by a second post on faster installation via binaries.

This last post immediately sparked some follow-up. Replying to my tweet about it, David Smith wondered how to combine binary and source installation (tl;dr: it is hard as you need to combine two package managers). Just this week, Max Ogden wondered how to install CRAN packages as binaries on Linux, and Daniel Nuest poked me on GitHub as part of his excellent containerit project as installation of binaries would of course also make Docker container builds much faster. (tl;dr: Oh yes, see below!)

So can one? Sure. We have a tool. But first the basics.

The Basics

Packages for a particular distribution are indexed by a packages file for that distribution. This is not unlike CRAN using top-level PACKAGES* files. So in principle you could just fetch those packages files, parse and index them, and then search them. In practice that is a lot of work as Debian and Ubuntu now have several tens of thousands of packages.

So it is better to use the distro tool. In my use case on .deb-based distros, this is apt-cache. Here is a quick example for the (Ubuntu 17.04) server on which I type this:

$ sudo apt-get update -qq            ## suppress stdout display
$ apt-cache search r-cran- | wc -l
419
$

So a very vanilla Ubuntu installation has "merely" 400+ binary CRAN packages. Nothing to write home about (yet) -- but read on.

cran2deb4ubuntu, or c2d4u for short

A decade ago, I was involved in two projects to turn all of CRAN into .deb binaries. We had a first ad-hoc predecessor project, and then (much better) a 'version 2' thanks to the excellent Google Summer of Code work by Charles Blundell (and mentored by me). I ran with that for a while and carried at the peak about 2500 binaries or so. And then my controlling db died, just as I visited CRAN to show it off. Very sad. Don Armstrong ran with the code and rebuilt it on better foundations and had for quite some time all of CRAN and BioC built (peaking at maybe 7k package). Then his RAID died. The surviving effort is the one by Michael Rutter who always leaned on the Lauchpad PPA system to build his packages. And those still exist and provide a core of over 10k packages (but across different Ubuntu flavours, see below).

Using cran2deb4ubuntu

In order to access c2d4u you need an Ubuntu system. For example my Travis runner script does

# Add marutter's c2d4u repository, (and rrutter for CRAN builds too)
sudo add-apt-repository -y "ppa:marutter/rrutter"
sudo add-apt-repository -y "ppa:marutter/c2d4u"

After that one can query apt-cache as above, but take advantage of a much larger pool with over 3500 packages (see below). The add-apt-repository command does the Right Thing (TM) in terms of both getting the archive key, and adding the apt source entry to the config directory.

How about from R? Sure, via RcppAPT

Now, all this command-line business is nice. But can we do all this programmatically from R? Sort of.

The RcppAPT package interface the libapt library, and provides access to a few functions. I used this feature when I argued (unsuccessfully, as it turned out) for a particular issue concerning Debian and R upgrades. But that is water under the bridge now, and the main point is that "yes we can".

In Docker: r-apt

Building on RcppAPT, within the Rocker Project we built on top of this by proving a particular class of containers for different Ubuntu releases which all contain i) RcppAPT and ii) the required apt source entry for Michael's repos.

So now we can do this

$ docker run --rm -ti rocker/r-apt:xenial /bin/bash -c 'apt-get update -qq; apt-cache search r-cran- | wc -l'
3525
$

This fires up the corresponding Docker container for the xenial (ie 16.04 LTS) release, updates the apt indices and then searches for r-cran-* packages. And it seems we have a little over 3500 packages. Not bad at all (especially once you realize that this skews strongly towards the more popular packages).

Example: An rstan container

A little while a ago a seemingly very frustrated user came to Carl and myself and claimed that out Rocker Project sucketh because building rstan was all but impossible. I don't have the time, space or inclination to go into details, but he was just plain wrong. You do need to know a little about C++, package building, and more to do this from scratch. Plus, there was a long-standing issue with rstan and newer Boost (which also included several workarounds).

Be that as it may, it serves as nice example here. So the first question: is rstan packaged?

$ docker run --rm -ti rocker/r-apt:xenial /bin/bash -c 'apt-get update -qq; apt-cache show r-cran-rstan'
Package: r-cran-rstan
Source: rstan
Priority: optional
Section: gnu-r
Installed-Size: 5110
Maintainer: cran2deb4ubuntu <cran2deb4ubuntu@gmail.com>
Architecture: amd64
Version: 2.16.2-1cran1ppa0
Depends: pandoc, r-base-core, r-cran-ggplot2, r-cran-stanheaders, r-cran-inline, r-cran-gridextra, r-cran-rcpp,\
   r-cran-rcppeigen, r-cran-bh, libc6 (>= 2.14), libgcc1 (>= 1:4.0), libstdc++6 (>= 5.2)
Filename: pool/main/r/rstan/r-cran-rstan_2.16.2-1cran1ppa0_amd64.deb
Size: 1481562
MD5sum: 60fe7cfc3e8813a822e477df24b37ccf
SHA1: 75bbab1a4193a5731ed105842725768587b4ec22
SHA256: 08816ea0e62b93511a43850c315880628419f2b817a83f92d8a28f5beb871fe2
Description: GNU R package "R Interface to Stan"
Description-md5: c9fc74a96bfde57f97f9d7c16a218fe5

$

It would seem so. With that, the following very minimal Dockerfile is all we need:

## Emacs, make this -*- mode: sh; -*-

## Start from xenial
FROM rocker/r-apt:xenial

## This handle reaches Carl and Dirk
MAINTAINER "Carl Boettiger and Dirk Eddelbuettel" rocker-maintainers@eddelbuettel.com

## Update and install rstan
RUN apt-get update && apt-get install -y --no-install-recommends r-cran-rstan

## Make R the default
CMD ["R"]

In essence, it executes one command: install rstan but from binary taking care of all dependencies. And lo and behold, it works as advertised:

$ docker run --rm -ti rocker/rstan:local Rscript -e 'library(rstan)'
Loading required package: ggplot2
Loading required package: StanHeaders
rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
$

So there: installing from binary works, takes care of dependencies, is easy and as an added bonus even faster. What's not too like?

(And yes, a few of us are working on a system to have more packages available as binaries, but it may take another moment...)

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sun, 17 Dec 2017

littler 0.3.3

max-heap image

The fourth release of littler as a CRAN package is now available, following in the now more than ten-year history as a package started by Jeff in 2006, and joined by me a few weeks later.

littler is the first command-line interface for R and predates Rscript. In my very biased eyes better as it allows for piping as well shebang scripting via #!, uses command-line arguments more consistently and still starts faster. Last but not least it is also less silly than Rscript and always loads the methods package avoiding those bizarro bugs between code running in R itself and a scripting front-end.

littler prefers to live on Linux and Unix, has its difficulties on OS X due to yet-another-braindeadedness there (who ever thought case-insensitive filesystems where a good idea?) and simply does not exist on Windows (yet -- the build system could be extended -- see RInside for an existence proof, and volunteers welcome!).

A few examples as highlighted at the Github repo:

This release brings a few new examples scripts, extends a few existing ones and also includes two fixes thanks to Carl. Again, no internals were changed. The NEWS file entry is below.

Changes in littler version 0.3.3 (2017-12-17)

  • Changes in examples

    • The script installGithub.r now correctly uses the upgrade argument (Carl Boettiger in #49).

    • New script pnrrs.r to call the package-native registration helper function added in R 3.4.0

    • The script install2.r now has more robust error handling (Carl Boettiger in #50).

    • New script cow.r to use R Hub's check_on_windows

    • Scripts cow.r and c4c.r use #!/usr/bin/env r

    • New option --fast (or -f) for scripts build.r and rcc.r for faster package build and check

    • The build.r script now defaults to using the current directory if no argument is provided.

    • The RStudio getters now use the rvest package to parse the webpage with available versions.

  • Changes in package

    • Travis CI now uses https to fetch script, and sets the group

Courtesy of CRANberries, there is a comparison to the previous release. Full details for the littler release are provided as usual at the ChangeLog page. The code is available via the GitHub repo, from tarballs off my littler page and the local directory here -- and now of course all from its CRAN page and via install.packages("littler"). Binary packages are available directly in Debian as well as soon via Ubuntu binaries at CRAN thanks to the tireless Michael Rutter.

Comments and suggestions are welcome at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/littler | permanent link

Sat, 16 Dec 2017

drat 0.1.4

drat user

A new version of drat just arrived on CRAN as another no-human-can-delay-this automatic upgrade directly from the CRAN prechecks (though I did need a manual reminder from Uwe to remove a now stale drat repo URL -- bad @hrbrmstr -- from the README in a first attempt).

This release is mostly the work of Neal Fultz who kindly sent me two squeaky-clean pull requests addressing two open issue tickets. As drat is reasonably small and simple, that was enough to motivate a quick release. I also ensured that PACKAGES.rds will always if committed along (if we're in commit mode), which is a follow-up to an initial change from 0.1.3 in September.

drat stands for drat R Archive Template, and helps with easy-to-create and easy-to-use repositories for R packages. Since its inception in early 2015 it has found reasonably widespread adoption among R users because repositories with marked releases is the better way to distribute code.

The NEWS file summarises the release as follows:

Changes in drat version 0.1.4 (2017-12-16)

  • Changes in drat functionality

    • Binaries for macOS are now split by R version into two different directories (Neal Futz in #67 addring #64).

    • The target branch can now be set via a global option (Neal Futz in #68 addressing #61).

    • In commit mode, add file PACKAGES.rds unconditionally.

  • Changes in drat documentation

    • Updated 'README.md' removing another stale example URL

Courtesy of CRANberries, there is a comparison to the previous release. More detailed information is on the drat page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/drat | permanent link

digest 0.6.13

A small maintenance release, version 0.6.13, of the digest package arrived on CRAN and in Debian yesterday.

digest creates hash digests of arbitrary R objects (using the 'md5', 'sha-1', 'sha-256', 'crc32', 'xxhash' and 'murmurhash' algorithms) permitting easy comparison of R language objects.

This release accomodates a request by Luke and Tomas to make the version argument of serialize() an argument to digest() too, which was easy enough to accomodate. The value 2L is the current default (and for now only permitted value). The ALTREP changes in R 3.5 will bring us a new, and more powerful, format with value 3L. Changes can be set in each call, or globally via options(). Other than we just clarified one aspect of raw vector usage in the manual page.

CRANberries provides the usual summary of changes to the previous version.

For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/digest | permanent link

Wed, 13 Dec 2017

#13: (Much) Faster Package (Re-)Installation via Binaries

Welcome to the thirteenth post in the ridiculously rapid R recommendation series, or R4 for short. A few days ago we riffed on faster installation thanks to ccache. Today we show another way to get equally drastic gains for some (if not most) packages.

In a nutshell, there are two ways to get your R packages off CRAN. Either you install as a binary, or you use source. Most people do not think too much about this as on Windows, binary is the default. So why wouldn't one? Precisely. (Unless you are on Windows, and you develop, or debug, or test, or ... and need source. Another story.) On other operating systems, however, source is the rule, and binary is often unavailable.

Or is it? Exactly how to find out what is available will be left for another post as we do have a tool just for that. But today, just hear me out when I say that binary is often an option even when source is the default. And it matters. See below.

As a (mostly-to-always) Linux user, I sometimes whistle between my teeth that we "lost all those battles" (i.e. for the desktop(s) or laptop(s)) but "won the war". That topic merits a longer post I hope to write one day, and I won't do it justice today but my main gist that everybody (and here I mean mostly developers/power users) now at least also runs on Linux. And by that I mean that we all test our code in Linux environments such as e.g. Travis CI, and that many of us run deployments on cloud instances (AWS, GCE, Azure, ...) which are predominantly based on Linux. Or on local clusters. Or, if one may dream, the top500 And on and on. And frequently these are Ubuntu machines.

So here is an Ubuntu trick: Install from binary, and save loads of time. As an illustration, consider the chart below. It carries over the logic from the 'cached vs non-cached' compilation post and contrasts two ways of installing: from source, or as a binary. I use pristine and empty Docker containers as the base, and rely of course on the official r-base image which is supplied by Carl Boettiger and yours truly as part of our Rocker Project (and for which we have a forthcoming R Journal piece I might mention). So for example the timings for the ggplot2 installation were obtained via

time docker run --rm -ti r-base  /bin/bash -c 'install.r ggplot2'

and

time docker run --rm -ti r-base  /bin/bash -c 'apt-get update && apt-get install -y r-cran-ggplot2'

Here docker run --rm -ti just means to launch Docker, in 'remove leftovers at end' mode, use terminal and interactive mode and invoke a shell. The shell command then is, respectively, to install a CRAN package using install.r from my littler package, or to install the binary via apt-get after updating the apt indices (as the Docker container may have been built a few days or more ago).

Let's not focus on Docker here---it is just a convenient means to an end of efficiently measuring via a simple (wall-clock counting) time invocation. The key really is that install.r is just a wrapper to install.packages() meaning source installation on Linux (as used inside the Docker container). And apt-get install ... is how one gets a binary. Again, I will try post another piece to determine how one finds if a suitable binary for a CRAN package exists. For now, just allow me to proceed.

So what do we see then? Well have a look:

A few things stick out. RQuantLib really is a monster. And dplyr is also fairly heavy---both rely on Rcpp, BH and lots of templating. At the other end, data.table is still a marvel. No external dependencies, and just plain C code make the source installation essentially the same speed as the binary installation. Amazing. But I digress.

We should add that one of the source installations also required installing additional libries: QuantLib is needed along with Boost for RQuantLib. Similar for another package (not shown) which needed curl and libcurl.

So what is the upshot? If you can, consider binaries. I will try to write another post how I do that e.g. for Travis CI where all my tests us binaries. (Yes, I know. This mattered more in the past when they did not cache. It still matters today as you a) do not need to fill the cache in the first place and b) do not need to worry about details concerning compilation from source which still throws enough people off. But yes, you can of course survive as is.)

The same approach is equally valid on AWS and related instances: I answered many StackOverflow questions where folks were failing to compile "large-enough" pieces from source on minimal installations with minimal RAM, and running out of resources and failed with bizarre errors. In short: Don't. Consider binaries. It saves time and trouble.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

RVowpalWabbit 0.0.10

A boring little RVowpalWabbit package update to version 0.0.10 came in response to another CRAN request: We were switching directories to run tests (or examples) which is now discouraged, so we no longer do this as it turns that we can of course refer to the files directly as well. Much cleaner.

No new code or features were added.

We should mention once more that is parallel work ongoing in a higher-level package interfacing the vw binary -- rvw -- as well as plan to redo this package via the external libraries. In that sounds interesting to you, please get in touch.

More information is on the RVowpalWabbit page. Issues and bugreports should go to the GitHub issue tracker.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rvowpalwabbit | permanent link

Sat, 09 Dec 2017

#12: Know and Customize your OS and Work Environment

Welcome to the twelveth post in the randomly relevant R recommendations series, or R4 for short. This post will insert a short diversion into what was planned as a sequence of posts on faster installations that started recently with this post but we will resume to it very shortly (for various definitions of "very" or "shortly").

Earlier today Davis Vaughn posted a tweet about a blog post of his describing a (term) paper he wrote modeling bitcoin volatilty using Alexios's excellent rugarch package---and all that typeset with the styling James and I put together in our little pinp package which is indeed very suitable for such tasks of writing (R)Markdown + LaTeX + R code combinations conveniently in a single source file.

Leaving aside the need to celebreate a term paper with a blog post and tweet, pinp is indeed very nice and deserving of some additional exposure and tutorials. Now, Davis sets out to do all this inside RStudio---as folks these days seem to like to limit themselves to a single tool or paradigm. Older and wiser users prefer the flexibility of switching tools and approaches, but alas, we digress. While Davis manages of course to do all this in RStudio which is indeed rather powerful and therefore rightly beloved, he closes on

I wish there was some way to have Live Rendering like with blogdown so that I could just keep a rendered version of the paper up and have it reload every time I save. That would be the dream!

and I can only add a forceful: Fear not, young man, for we can help thou!

Modern operating systems have support for epoll and libnotify, which can be used from the shell. Just how your pdf application refreshes automagically when a pdf file is updated, we can hook into this from the shell to actually create the pdf when the (R)Markdown file is updated. I am going to use a tool readily available on my Linux systems; macOS will surely have something similar. The entr command takes one or more file names supplied on stdin and executes a command when one of them changes. Handy for invoking make whenever one of your header or source files changes, and useable here. E.g. the last markdown file I was working on was named comments.md and contained comments to a referee, and we can auto-process it on each save via

echo comments.md | entr render.r comments.md

which uses render.r from littler (new release soon too...; a simple Rscript -e 'rmarkdown::render("comments.md")' would probably work too but render.r is shorter and little more powerful so I use it more often myself) on the input file comments.md which also happens to (here sole) file being monitored.

And that is really all there is to it. I wanted / needed something like this a few months ago at work too, and may have used an inotify-based tool there but cannot find any notes. Python has something similar via watchdog which is yet again more complicated / general.

It turns out that auto-processing is actually not that helpful as we often save before an expression is complete, leading to needless error messages. So at the end of the day, I often do something much simpler. My preferred editor has a standard interface to 'building': pressing C-x c loads a command (it recalls) that defaults to make -k (i.e., make with error skipping). Simply replacing that with render.r comments.md (in this case) means we get an updated pdf file when we want with a simple customizable command / key-combination.

So in sum: it is worth customizing your environments, learning about what your OS may have, and looking beyond a single tool / editor / approach. Even dreams may come true ...

Postscriptum: And Davis takes this in a stride and almost immediately tweeted a follow-up with a nice screen capture mp4 movie showing that entr does indeed work just as well on his macbook.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Wed, 06 Dec 2017

RcppArmadillo 0.8.300.1.0

armadillo image

Another RcppArmadillo release hit CRAN today. Since our last 0.8.100.1.0 release in October, Conrad kept busy and produced Armadillo releases 8.200.0, 8.200.1, 8.300.0 and now 8.300.1. We tend to now package these (with proper reverse-dependency checks and all) first for the RcppCore drat repo from where you can install them "as usual" (see the repo page for details). But this actual release resumes within our normal bi-monthly CRAN release cycle.

These releases improve a few little nags on the recent switch to more extensive use of OpenMP, and round out a number of other corners. See below for a brief summary.

Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language--and is widely used by (currently) 405 other packages on CRAN.

A high-level summary of changes follows.

Changes in RcppArmadillo version 0.8.300.1.0 (2017-12-04)

  • Upgraded to Armadillo release 8.300.1 (Tropical Shenanigans)

    • faster handling of band matrices by solve()

    • faster handling of band matrices by chol()

    • faster randg() when using OpenMP

    • added normpdf()

    • expanded .save() to allow appending new datasets to existing HDF5 files

  • Includes changes made in several earlier GitHub-only releases (versions 0.8.300.0.0, 0.8.200.2.0 and 0.8.200.1.0).

  • Conversion from simple_triplet_matrix is now supported (Serguei Sokol in #192).

  • Updated configure code to check for g++ 5.4 or later to enable OpenMP.

  • Updated the skeleton package to current packaging standards

  • Suppress warnings from Armadillo about missing OpenMP support and -fopenmp flags by setting ARMA_DONT_PRINT_OPENMP_WARNING

Courtesy of CRANberries, there is a diffstat report. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Tue, 28 Nov 2017

Rcpp now used by 1250 CRAN packages

1250 Rcpp packages

Earlier today Rcpp passed 1250 reverse-dependencies on CRAN as another big milestone. The graph is on the left depicts the growth of Rcpp usage (as measured by Depends, Imports and LinkingTo, but excluding Suggests) over time.

Rcpp cleared 300 packages in November 2014. It passed 400 packages in June 2015 (when I only tweeted about it), 500 packages in late October 2015, 600 packages last March, 700 packages last July, 800 packages last October, 900 packages early January, and 1000 packages in April. The chart extends to the very beginning via manually compiled data from CRANberries and checked with crandb. The next part uses manually saved entries. The core (and by far largest) part of the data set was generated semi-automatically via a short script appending updates to a small file-based backend. A list of packages using Rcpp is kept on this page.

Also displayed in the graph is the relative proportion of CRAN packages using Rcpp. The four per-cent hurdle was cleared just before useR! 2014 where I showed a similar graph (as two distinct graphs) in my invited talk. We passed five percent in December of 2014, six percent July of 2015, seven percent just before Christmas 2015, eight percent last summer, nine percent mid-December 2016 and then cracked ten percent this summer.

1250 user packages is staggering. We can use the progression of CRAN itself compiled by Henrik in a series of posts and emails to the main development mailing list. A decade ago CRAN itself did not have 1250 packages, and here we are approaching 12k with Rcpp at 10% and growing steadily. Amazeballs.

This puts a whole lot of responsibility on us in the Rcpp team as we continue to keep Rcpp as performant and reliable as it has been.

And with that, and as always, a very big Thank You! to all users and contributors of Rcpp for help, suggestions, bug reports, documentation or, of course, code.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Mon, 27 Nov 2017

#11: (Much) Faster Package (Re-)Installation via Caching

Welcome to the eleventh post in the rarely rued R rants series, or R4 for short. Time clearly flies as it has been three months since out last post on significantly reducing library size via stripping. I had been meaning to post on today's topic for quite some time, but somehow something (working on a paper, releasing a package, ...) got in the way.

Just a few days ago Colin (of Efficient R Programming fame) posted about speed(ing up) package installation. His recommendation? Remember that we (usually) have multiple cores and using several of them via options(Ncpus = XX). It is an excellent point, and it bears repeating.

But it turns I have not one but two salient recommendations too. Today covers the first, we should hopefully get pretty soon to the second. Both have one thing in common: you will be fastest if you avoid doing the work in the first place.

What?

One truly outstanding tool for this in the context of the installation of compiled packages is ccache. It is actually a pretty old tool that has been out for well over a decade, and it comes from the folks that gave the Samba filesystem.

What does it do? Well, it nutshell, it "hashes" a checksum of a source file once the preprocessor has operated on it and stores the resulting object file. In the case of rebuild with unchanged code you get the object code back pretty much immediately. The idea is very similar to memoisation (as implemented in R for example in the excellent little memoise package by Hadley, Jim, Kirill and Daniel. The idea is the same: if you have to do something even moderately expensive a few times, do it once and then recall it the other times.

This happens (at least to me) more often that not in package development. Maybe you change just one of several source files. Maybe you just change the R code, the Rd documentation or a test file---yet still need a full reinstallation. In all these cases, ccache can help tremdendously as illustrated below.

How?

Because essentially all our access to compilation happens through R, we need to set this in a file read by R. I use ~/.R/Makevars for this and have something like these lines on my machines:

VER=
CCACHE=ccache
CC=$(CCACHE) gcc$(VER)
CXX=$(CCACHE) g++$(VER)
CXX11=$(CCACHE) g++$(VER)
CXX14=$(CCACHE) g++$(VER)
FC=$(CCACHE) gfortran$(VER)
F77=$(CCACHE) gfortran$(VER)

That way, when R calls the compiler(s) it will prefix with ccache. And ccache will then speed up.

There is an additional issue due to R use. Often we install from a .tar.gz. These will be freshly unpackaged, and hence have "new" timestamps. This would usually lead ccache to skip to file (fear of "false positives") so we have to override this. Similarly, the tarball is usually unpackage in a temporary directory with an ephemeral name, creating a unique path. That too needs to be overwritten. So in my ~/.ccache/ccache.conf I have this:

max_size = 5.0G
# important for R CMD INSTALL *.tar.gz as tarballs are expanded freshly -> fresh ctime
sloppiness = include_file_ctime
# also important as the (temp.) directory name will differ
hash_dir = false

Show Me

A quick illustration will round out the post. Some packages are meatier than others. More C++ with more templates usually means longer build times. Below is a quick chart comparing times for a few such packages (ie RQuantLib, dplyr, rstan) as well as igraph ("merely" a large C package) and lme4 as well as Rcpp. The worst among theseis still my own RQuantLib package wrapping (still just parts of) the genormous and Boost-heavy QuantLib library.

Pretty dramatic gains. Best of all, we can of course combine these with other methods such as Colin's use of multiple CPUs, or even a simple MAKE=make -j4 to have multiple compilation units being considered in parallel. So maybe we all get to spend less time on social media and other timewasters as we spend less time waiting for our builds. Or maybe that is too much to hope for...

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sun, 26 Nov 2017

rfoaas 1.1.1: Updated and extended

rfoaas greed example

FOAAS upstream is still at release 1.1.0, but added a few new accessors a couple of months ago. So this new version of rfoaas updates to these: asshole(), cup(), fyyff(), immensity(), programmer(), rtfm(), thinking(). We also added test coverage and in doing so noticed that our actual tests never ran on Travis. Yay. Now fixed.

As usual, CRANberries provides a diff to the previous CRAN release. Questions, comments etc should go to the GitHub issue tracker. More background information is on the project page as well as on the github repo

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rfoaas | permanent link

Fri, 24 Nov 2017

Rcpp 0.12.14: Some deprecation and minor updates

The fourteenth release in the 0.12.* series of Rcpp landed on CRAN yesterday after a somewhat longer-than-usual gestation period (and word is it may have been due to some unrelated disturbances from lots of changes within the main r-devel build).

This release follows the 0.12.0 release from July 2016, the 0.12.1 release in September 2016, the 0.12.2 release in November 2016, the 0.12.3 release in January 2017, the 0.12.4 release in March 2016, the 0.12.5 release in May 2016, the 0.12.6 release in July 2016, the 0.12.7 release in September 2016, the 0.12.8 release in November 2016, the 0.12.9 release in January 2017, the 0.12.10.release in March 2017, the 0.12.11.release in May 2017, the 0.12.12 release in July 2017 and the 0.12.13.release in late September 2017 making it the eighteenth release at the steady and predictable bi-montly release frequency.

Rcpp has become the most popular way of enhancing GNU R with C or C++ code. As of today, 1246 packages (and hence 77 more since the last release) on CRAN depend on Rcpp for making analytical code go faster and further, along with another 91 in BioConductor.

This release is relatively minor compared to other releases, but follows through on the deprecattion of the old vectors for Date and Datetime (which were terrible: I was influenced by the vector design in QuantLib at the time and didn't really understand yet how a SEXP vector should work) we announced with Rcpp 0.12.8 a year ago. So now the new vectors are the default, but you can flip back if you need to with #define.

Otherwise Dan rounded a corner with the improved iterators he contributed, and Kirill improved the output stream implementation suppressing a warning with newer compilers.

Changes in Rcpp version 0.12.14 (2017-11-17)

  • Changes in Rcpp API:

    • New const iterators functions cbegin() and cend() added to MatrixRow as well (Dan Dillon in #750).

    • The Rostream object now contains a Buffer rather than allocating one (Kirill Müller in #763).

    • New DateVector and DatetimeVector classes are now the default fully deprecating the old classes as announced one year ago.

  • Changes in Rcpp Package:

    • DESCRIPTION file now list doi information per CRAN suggestion.
  • Changes in Rcpp Documentation:

    • Update CITATION file with doi information and PeerJ preprint.

Thanks to CRANberries, you can also look at a diff to the previous release. As always, details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads page, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Sun, 19 Nov 2017

RcppEigen 0.3.3.3.1

A maintenance release 0.3.3.3.1 of RcppEigen is now on CRAN (and will get to Debian soon). It brings Eigen 3.3.* to R.

The impetus was a request from CRAN to change the call to Rcpp::Rcpp.plugin.maker() to only use :: as the function has in fact been exported and accessible for a pretty long time. So now the usage pattern catches up. Otherwise, Haiku-OS is now supported and a minor Travis tweak was made.

The complete NEWS file entry follows.

Changes in RcppEigen version 0.3.3.3.1 (2017-11-19)

  • Compilation under Haiku-OS is now supported (Yu Gong in #45).

  • The Rcpp.plugin.maker helper function is called via :: as it is in fact exported (yet we had old code using :::).

  • A spurious argument was removed from an example call.

  • Travis CI now uses https to fetch the test runner script.

Courtesy of CRANberries, there is also a diffstat report for the most recent release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

RcppClassic 0.9.9

A maintenance release RcppClassic 0.9.9 is now at CRAN. This package provides a maintained version of the otherwise deprecated first Rcpp API; no new projects should use it.

Per a request from CRAN, we changed the call to Rcpp::Rcpp.plugin.maker() to only use :: as the function has in fact been exported and accessible for a pretty long time. So now the usage pattern catches up.

Courtesy of CRANberries, there are changes relative to the previous release.

Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Wed, 08 Nov 2017

R / Finance 2018 Call for Papers

The tenth (!!) annual annual R/Finance conference will take in Chicago on the UIC campus on June 1 and 2, 2018. Please see the call for papers below (or at the website) and consider submitting a paper.

We are once again very excited about our conference, thrilled about who we hope may agree to be our anniversary keynotes, and hope that many R / Finance users will not only join us in Chicago in June -- and also submit an exciting proposal.

So read on below, and see you in Chicago in June!

Call for Papers

R/Finance 2018: Applied Finance with R
June 1 and 2, 2018
University of Illinois at Chicago, IL, USA

The tenth annual R/Finance conference for applied finance using R will be held June 1 and 2, 2018 in Chicago, IL, USA at the University of Illinois at Chicago. The conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past nine years, R/Finance has includedattendeesfrom around the world. It has featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2018.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format) although more complete papers are preferred. We welcome submissions for both full talks and abbreviated "lightning talks." Both academic and practitioner proposals related to R are encouraged.

All slides will be made publicly available at conference time. Presenters are strongly encouraged to provide working R code to accompany the slides. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

Please submit proposals online at http://go.uic.edu/rfinsubmit. Submissions will be reviewed and accepted on a rolling basis with a final submission deadline of February 2, 2018. Submitters will be notified via email by March 2, 2018 of acceptance, presentation length, and financial assistance (if requested).

Financial assistance for travel and accommodation may be available to presenters. Requests for financial assistance do not affect acceptance decisions. Requests should be made at the time of submission. Requests made after submission are much less likely to be fulfilled. Assistance will be granted at the discretion of the conference committee.

Additional details will be announced via the conference website at http://www.RinFinance.com/ as they become available. Information on previous years'presenters and their presentations are also at the conference website. We will make a separate announcement when registration opens.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

/computers/R | permanent link

RQuantLib 0.4.4: Several smaller updates

A shiny new (mostly-but-not-completely maintenance) release of RQuantLib, now at version 0.4.4, arrived on CRAN overnight, and will get to Debian shortly. This is the first release in over a year, and it it contains (mostly) a small number of fixes throughout. It also includes the update to the new DateVector and DatetimeVector classes which become the default with the upcoming Rcpp 0.12.14 release (just like this week's RcppQuantuccia release). One piece of new code is due to François Cocquemas who added support for discrete dividends to both European and American options. See below for the complete set of changes reported in the NEWS file.

As with release 0.4.3 a little over a year ago, we will not have new Windows binaries from CRAN as I apparently have insufficient powers of persuasion to get CRAN to update their QuantLib libraries. So we need a volunteer. If someone could please build a binary package for Windows from the 0.4.4 sources, I would be happy to once again host it on the GHRR drat repo. Please contact me directly if you can help.

Changes are listed below:

Changes in RQuantLib version 0.4.4 (2017-11-07)

  • Changes in RQuantLib code:

    • Equity options can now be analyzed via discrete dividends through two vectors of dividend dates and values (Francois Cocquemas in #73 fixing #72)

    • Some package and dependency information was updated in files DESCRIPTION and NAMESPACE.

    • The new Date(time)Vector classes introduced with Rcpp 0.12.8 are now used when available.

    • Minor corrections were applied to BKTree, to vanilla options for the case of intraday time stamps, to the SabrSwaption documentation, and to bond utilities for the most recent QuantLib release.

Courtesy of CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the RQuantLib page. Questions, comments etc should go to the rquantlib-devel mailing list off the R-Forge page. Issue tickets can be filed at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rquantlib | permanent link

Mon, 06 Nov 2017

RcppQuantuccia 0.0.2

A first maintenance release of RcppQuantuccia got to CRAN earlier today.

RcppQuantuccia brings the Quantuccia header-only subset / variant of QuantLib to R. At present it mostly offers calendaring, but Quantuccia just got a decent amount of new functions so hopefully we can offer more here too.

This release was motivated by the upcoming Rcpp release which will deprecate the okd Date and Datetime vectors in favours of newer ones. So this release of RcppQuantuccia switches to the newer ones.

Other changes are below:

Changes in version 0.0.2 (2017-11-06)

  • Added calendars for Canada, China, Germany, Japan and United Kingdom.

  • Added bespoke and joint calendars.

  • Using new date(time) vectors (#6).

Courtesy of CRANberries, there is also a diffstat report relative to the previous release. More information is on the RcppQuantuccia page. Issues and bugreports should go to the GitHub issue tracker.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Sun, 05 Nov 2017

pinp 0.0.4: Small tweak

A maintenance release of our pinp package for snazzier one or two column vignettes is now on CRAN as of yesterday.

In version 0.0.3, we disabled the default \pnasbreak command we inherit from the PNAS LaTeX style. That change turns out to have been too drastic. So we reverted yet added a new YAML front-matter option skip_final_break which, if set to TRUE, will skip this break. With a default value of FALSE we maintain prior behaviour.

A screenshot of the package vignette can be seen below. Additional screenshots of are at the pinp page.

The NEWS entry for this release follows.

Changes in pinp version 0.0.4 (2017-11-04)

  • Correct NEWS headers from 'tint' to 'pinp' (#45).

  • New front-matter variables ‘skip_final_break’ skips the \pnasbreak on final page which back as default (#47).

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the tint page. For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/pinp | permanent link

Fri, 03 Nov 2017

tint 0.0.4: Small enhancements

A maintenance release of the tint package arrived on CRAN earlier today. Its name expands from tint is not tufte as the package offers a fresher take on the Tufte-style for html and pdf presentations.

A screenshot of the pdf variant is below.

This release brings some minor enhancements and polish, mostly learned from having done the related pinp (two-column vignette in the PNAS style) and linl (LaTeX letter) RMarkdown-wrapper packages; see below for details from the NEWS.Rd file.

Changes in tint version 0.0.4 (2017-11-02)

  • Skeleton files are also installed as vignettes (#20).

  • A reference to the Tufte source file now points to tint (Ben Marwick in #19, later extended to other Rmd files).

  • Several spelling and grammar errors were corrected too (#13 and #16 by R. Mark Sharp and Matthew Henderson)

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the tint page.

For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/tint | permanent link

Tue, 31 Oct 2017

linl 0.0.2: Couple improvements

Following up on the initial 0.0.1 release of linl, Aaron and I are happy to announce release 0.0.2 which reached the CRAN network on Sunday in a smooth 'CRAN-pretest-publish' auto-admittance. linl provides a simple-yet-powerful Markdown---and RMarkdown---wrapper around the venerable LaTeX letter class; see below for an expanded example also included as the package vignette.

This versions sets a few sensible default values for font, font size, margins, signature (non-)indentation and more; it also expands the documentation.

The NEWS entry follows:

Changes in tint version 0.0.2 (2017-10-29)

  • Set a few defaults for a decent-looking skeleton and template: font, fontsize, margins, left-justify closing (#3)

  • Blockquote display is now a default as well (#4).

  • Updated skeleton.Rmd and vignette source accordingly

  • Documented new default options (#5 and #6).

  • Links are now by default printed as footnotes (#9).

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the tint page. For questions or comments use the issue tracker off the GitHub repo.

For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/linl | permanent link

Mon, 30 Oct 2017

pinp 0.0.3: More docs, more features

Our pinp package for snazzier one or two column vignette received it second update. Now at version 0.0.3, it arrived on CRAN on Saturday with minimal fuzz as an 'CRAN-pretest-publish' transition.

We added more frontmatter options, documented more, and streamlined some internals of the LaTeX class borrowed from PNAS. A screenshot of the (updated) vignette can be seen below. Additional screenshots of are at the pinp page.

The NEWS entry for this release follows.

Changes in tint version 0.0.3 (2017-10-28)

  • Section 'Acknowledgements' now conditional on a frontmatter setting, section 'Matmethods' has been removed, pnasbreak no longer used which stabilizes LaTeX float formatting. References are now shown in the column just like other content (Dirk in #36).

  • Vignette now uses new numbered sections frontmatter switch which improves the pdf outline.

  • New front-matter options for title/section header colors, and link colors (Dirk in #39).

  • YAML frontmater options are now documented in the help page for pinp as well (Dirk in #41).

  • Some typos were fixed (Michael in #42 and #43).

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the tint page. For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/pinp | permanent link

Sun, 22 Oct 2017

linl 0.0.1: linl is not Letter

Aaron Wolen and I are pleased to announce the availability of the initial 0.0.1 release of our new linl package on the CRAN network. It provides a simple-yet-powerful Markdown---and RMarkdown---wrapper the venerable LaTeX letter class. Aaron had done the legwork in the underlying pandoc-letter repository upon which we build via proper rmarkdown integration.

The package also includes a LaTeX trick or two: optional header and signature files, nicer font, better size, saner default geometry and more. See the following screenshot which shows the package vignette---itself a simple letter---along with (most of) its source:

The initial (short) NEWS entry follows:

Changes in tint version 0.0.1 (2017-10-17)

  • Initial CRAN release

The date is a little off; it took a little longer than usual for the good folks at CRAN to process the initial submission. We expect future releases to be more timely.

For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/linl | permanent link

Thu, 12 Oct 2017

GitHub Streak: Round Four

Three years ago I referenced the Seinfeld Streak used in an earlier post of regular updates to to the Rcpp Gallery:

This is sometimes called Jerry Seinfeld's secret to productivity: Just keep at it. Don't break the streak.

and showed the first chart of GitHub streaking

github activity october 2013 to october 2014

And two year ago a first follow-up appeared in this post:

github activity october 2014 to october 2015

And a year ago we had a followup last year

github activity october 2015 to october 2016

And as it October 12 again, here is the new one:

github activity october 2016 to october 2017

Again, special thanks go to Alessandro Pezzè for the Chrome add-on GithubOriginalStreak.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/computers/misc | permanent link

Wed, 11 Oct 2017

RcppArmadillo 0.8.100.1.0

armadillo image

We are thrilled to announce a new big RcppArmadillo release! Conrad recently moved Armadillo to the 8.* series, with significant improvements and speed ups for sparse matrix operations, and more. See below for a brief summary.

This also required some changes at our end which Binxiang Ni provided, and Serguei Sokol improved some instantiations. We now show the new vignette Binxiang Ni wrote for his GSoC contribution, and I converted it (and the other main vignette) to using the pinp package for sleeker pdf vignettes.

This release resumes our bi-monthly CRAN release cycle. I may make interim updates available at GitHub "as needed". And this time I managed to mess up the reverse depends testing, and missed one sync() call on the way back to R---but all that is now taken care of.

Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language--and is widely used by (currently) 405 other packages on CRAN.

A high-level summary of changes follows.

Changes in RcppArmadillo version 0.8.100.1.0 (2017-10-05)

  • Upgraded to Armadillo release 8.100.1 (Feral Pursuits)

    • faster incremental construction of sparse matrices via element access operators

    • faster diagonal views in sparse matrices

    • expanded SpMat to save/load sparse matrices in coord format

    • expanded .save(),.load() to allow specification of datasets within HDF5 files

    • added affmul() to simplify application of affine transformations

    • warnings and errors are now printed by default to the std::cerr stream

    • added set_cerr_stream() and get_cerr_stream() to replace set_stream_err1(), set_stream_err2(), get_stream_err1(), get_stream_err2()

    • new configuration options ARMA_COUT_STREAM and ARMA_CERR_STREAM

  • Constructors for sparse matrices of types dgt, dtt amd dst now use Armadillo code for improved performance (Serguei Sokol in #175 addressing #173)

  • Sparse matrices call .sync() before accessing internal arrays (Binxiang Ni in #171)

  • The sparse matrix vignette has been converted to Rmarkdown using the pinp package, and is now correctly indexed. (#176)

Courtesy of CRANberries, there is a diffstat report. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Tue, 03 Oct 2017

RProtoBuf 0.4.11

RProtoBuf provides R bindings for the Google Protocol Buffers ("ProtoBuf") data encoding and serialization library used and released by Google, and deployed fairly widely in numerous projects as a language and operating-system agnostic protocol.

A new releases RProtoBuf 0.4.11 appeared on CRAN earlier today. Not unlike the other recent releases, it is mostly a maintenance release which switches two of the vignettes over to using the pinp package and its template for vignettes.

Changes in RProtoBuf version 0.4.11 (2017-10-03)

  • The RProtoBuf-intro and RProtoBuf-quickref vignettes were converted to Rmarkdown using the templates and style file from the pinp package.

  • A few minor internal upgrades

CRANberries also provides a diff to the previous release. The RProtoBuf page has copies of the (older) package vignette, the 'quick' overview vignette, a unit test summary vignette, and the pre-print for the JSS paper. Questions, comments etc should go to the GitHub issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rprotobuf | permanent link

Thu, 28 Sep 2017

Rcpp 0.12.13: Updated vignettes, and more

The thirteenth release in the 0.12.* series of Rcpp landed on CRAN this morning, following a little delay because Uwe Ligges was traveling and whatnot. We had announced its availability to the mailing list late last week. As usual, a rather substantial amount of testing effort went into this release so you should not expect any surprise.

This release follows the 0.12.0 release from July 2016, the 0.12.1 release in September 2016, the 0.12.2 release in November 2016, the 0.12.3 release in January 2017, the 0.12.4 release in March 2016, the 0.12.5 release in May 2016, the 0.12.6 release in July 2016, the 0.12.7 release in September 2016, the 0.12.8 release in November 2016, the 0.12.9 release in January 2017, the 0.12.10.release in March 2017, the 0.12.11.release in May 2017, and the 0.12.12 release in July 2017 making it the seventeeth release at the steady and predictable bi-montly release frequency.

Rcpp has become the most popular way of enhancing GNU R with C or C++ code. As of today, 1069 packages (and hence 73 more since the last release) on CRAN depend on Rcpp for making analytical code go faster and further, along with another 91 in BioConductor.

This releases contains a large-ish update to the documentation as all vignettes (apart from the unit test one, which is a one-off) now use Markdown and the (still pretty new) pinp package by James and myself. There is also a new vignette corresponding to the PeerJ preprint James and I produced as an updated and current Introduction to Rcpp replacing the older JSS piece (which is still included as a vignette too).

A few other things got fixed: Dan is working on const iterators you would expect with modern C++, Lei Yu spotted error in Modules, and more. See below for details.

Changes in Rcpp version 0.12.13 (2017-09-24)

  • Changes in Rcpp API:

    • New const iterators functions cbegin() and cend() have been added to several vector and matrix classes (Dan Dillon and James Balamuta in #748) starting to address #741).
  • Changes in Rcpp Modules:

    • Misplacement of one parenthesis in macro LOAD_RCPP_MODULE was corrected (Lei Yu in #737)
  • Changes in Rcpp Documentation:

    • Rewrote the macOS sections to depend on official documentation due to large changes in the macOS toolchain. (James Balamuta in #742 addressing issue #682).

    • Added a new vignette ‘Rcpp-introduction’ based on new PeerJ preprint, renamed existing introduction to ‘Rcpp-jss-2011’.

    • Transitioned all vignettes to the 'pinp' RMarkdown template (James Balamuta and Dirk Eddelbuettel in #755 addressing issue #604).

    • Added an entry on running 'compileAttributes()' twice to the Rcpp-FAQ (##745).

Thanks to CRANberries, you can also look at a diff to the previous release. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads page, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Wed, 27 Sep 2017

RcppZiggurat 0.1.4

ziggurats

A maintenance release of RcppZiggurat is now on the CRAN network for R. It switched the vignette to the our new pinp package and its two-column pdf default.

The RcppZiggurat package updates the code for the Ziggurat generator which provides very fast draws from a Normal distribution. The package provides a simple C++ wrapper class for the generator improving on the very basic macros, and permits comparison among several existing Ziggurat implementations. This can be seen in the figure where Ziggurat from this package dominates accessing the implementations from the GSL, QuantLib and Gretl---all of which are still way faster than the default Normal generator in R (which is of course of higher code complexity).

The NEWS file entry below lists all changes.

Changes in version 0.1.4 (2017-07-27)

  • The vignette now uses the pinp package in two-column mode.

  • Dynamic symbol registration is now enabled.

Courtesy of CRANberries, there is also a diffstat report for the most recent release. More information is on the RcppZiggurat page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Tue, 26 Sep 2017

RcppAnnoy 0.0.10

A few short weeks after the more substantial 0.0.9 release of RcppAnnoy, we have a quick bug-fix update.

RcppAnnoy is our Rcpp-based R integration of the nifty Annoy library by Erik. Annoy is a small and lightweight C++ template header library for very fast approximate nearest neighbours.

Michaël Benesty noticed that our getItemsVector() function didn't, ahem, do much besides crashing. Simple bug, they happen--now fixed, and a unit test added.

Changes in this version are summarized here:

Changes in version 0.0.10 (2017-09-25)

  • The getItemsVector() function no longer crashes (#24)

Courtesy of CRANberries, there is also a diffstat report for this release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Sun, 24 Sep 2017

RcppGSL 0.3.3

A maintenance update RcppGSL 0.3.3 is now on CRAN. It switched the vignette to the our new pinp package and its two-column pdf default.

The RcppGSL package provides an interface from R to the GNU GSL using the Rcpp package.

No user-facing new code or features were added. The NEWS file entries follow below:

Changes in version 0.3.3 (2017-09-24)

  • We also check for gsl-config at package load.

  • The vignette now uses the pinp package in two-column mode.

  • Minor other fixes to package and testing infrastructure.

Courtesy of CRANberries, a summary of changes to the most recent release is available.

More information is on the RcppGSL page. Questions, comments etc should go to the issue tickets at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Sat, 23 Sep 2017

RcppCNPy 0.2.7

A new version of the RcppCNPy package arrived on CRAN yesterday.

RcppCNPy provides R with read and write access to NumPy files thanks to the cnpy library by Carl Rogers.

This version updates internals for function registration, but otherwise mostly switches the vignette over to the shiny new pinp two-page template and package.

Changes in version 0.2.7 (2017-09-22)

  • Vignette updated to Rmd and use of pinp package

  • File src/init.c added for dynamic registration

CRANberries also provides a diffstat report for the latest release. As always, feedback is welcome and the best place to start a discussion may be the GitHub issue tickets page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

RcppClassic 0.9.8

A bug-fix release RcppClassic 0.9.8 for the very recent 0.9.7 release which fixes a build issue on macOS introduced in 0.9.7. No other changes.

Courtesy of CRANberries, there are changes relative to the previous release.

Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Wed, 20 Sep 2017

pinp 0.0.2: Onwards

A first update 0.0.2 of the pinp package arrived on CRAN just a few days after the initial release.

We added a new vignette for the package (see below), extended a few nice features, and smoothed a few corners.

The NEWS entry for this release follows.

Changes in tint version 0.0.2 (2017-09-20)

  • The YAML segment can be used to select font size, one-or-two column mode, one-or-two side mode, linenumbering and watermarks (#21 and #26 addressing #25)

  • If pinp.cls or jss.bst are not present, they are copied in ((#27 addressing #23)

  • Output is now in shaded framed boxen too (#29 addressing #28)

  • Endmatter material is placed in template.tex (#31 addressing #30)

  • Expanded documentation of YAML options in skeleton.Rmd and clarified available one-column option (#32).

  • Section numbering can now be turned on and off (#34)

  • The default bibliography style was changed to jss.bst.

  • A short explanatory vignette was added.

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the tint page. For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/pinp | permanent link

Sun, 17 Sep 2017

RcppClassic 0.9.7

A rather boring and otherwise uneventful release 0.9.7 of RcppClassic is now at CRAN. This package provides a maintained version of the otherwise deprecated first Rcpp API; no new projects should use it.

Once again no changes in user-facing code. But this makes it the first package to use the very new and shiny pinp package as the backend for its vignette, now converted to Markdown---see here for this new version. We also updated three sources files for tabs versus spaces as the current g++ version complained (correctly !!) about misleading indents. Otherwise a file src/init.c was added for dynamic registration, the Travis CI runner script was updated to using run.sh from our r-travis fork, and we now strip the library after they have been built. Again, no user code changes.

And no iterate: nobody should use this package. Rcpp is so much better in so many ways---this one is simply available as we (quite strongly) believe that APIs are contracts, and as such we hold up our end of the deal.

Courtesy of CRANberries, there are changes relative to the previous release.

Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Sat, 16 Sep 2017

pinp 0.0.1: pinp is not PNAS

A brandnew and very exciting (to us, at least) package called pinp just arrived on CRAN, following a somewhat unnecessarily long passage out of incoming. It is based on the PNAS LaTeX Style offered by the Proceeding of the National Academy of Sciences of the United States of America, or PNAS for short. And there is already a Markdown version in the wonderful rticles packages.

But James Balamuta and I thought we could do one better when we were looking to typeset our recent PeerJ Prepint as an attractive looking vignette for use within the Rcpp package.

And so we did by changing a few things (font, color, use of natbib and Chicago.bst for references, removal of a bunch of extra PNAS-specific formalities from the frontpage) and customized a number of other things for easier use by vignettes directly from the YAML header (draft mode watermark, doi or url for packages, easier author naming in footer, bibtex file and more).

We are quite pleased with the result which seems ready for the next Rcpp release---see e.g., these two teasers:

and

and the pinp package page or the GitHub repo have the full (four double-)pages of what turned a more dull looking 27 page manuscript into eight crisp two-column pages.

We have few more things planned (i.e., switching to single column mode, turning on linenumbers at least in one-column mode).

For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/pinp | permanent link

drat 0.1.3

A new version of drat arrived earlier today on CRAN as another no-human-can-delay-this automatic upgrade directly from the CRAN prechecks. It is mostly a maintenance release ensuring PACKAGES.rds is also updated, plus some minor other edits.

drat stands for drat R Archive Template, and helps with easy-to-create and easy-to-use repositories for R packages. Since its inception in early 2015 it has found reasonably widespread adoption among R users because repositories with marked releases is the better way to distribute code.

The NEWS file summarises the release as follows:

Changes in drat version 0.1.3 (2017-09-16)

  • Ensure PACKAGES.rds, if present, is also inserted in repo

  • Updated 'README.md' removing stale example URLs (#63)

  • Use https to fetch Travis CI script from r-travis

Courtesy of CRANberries, there is a comparison to the previous release. More detailed information is on the drat page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/drat | permanent link

Wed, 13 Sep 2017

RcppMsgPack 0.2.0

A new and much enhanced version of RcppMsgPack arrived on CRAN a couple of days ago. It came together following this email to the r-package-devel list which made it apparent that Travers Ching had been working on MessagePack converters for R which required the very headers I had for use from, inter alia, the RcppRedis package.

So we joined our packages. I updated the headers in RcppMsgPack to the current upstream version 2.1.5 of MessagePack, and Travers added his helper functions allow direct packing / unpacking of MessagePack objects at the R level, as well as tests and a draft vignette. Very exciting, and great to have a coauthor!

So now RcppMspPack provides R with both MessagePack header files for use via C++ (or C, if you must) packages such as RcppRedis --- and direct conversion routines at the R prompt.

MessagePack itself is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it is faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves.

Changes in version 0.2.0 (2017-09-07)

  • Added support for building on Windows

  • Upgraded to MsgPack 2.1.5 (#3)

  • New R functions to manipulate MsgPack objects: msgpack_format, msgpack_map, msgpack_pack, msgpack_simplify, mgspack_unpack (#4)

  • New R functions also available as msgpackFormat, msgpackMap, msgpackPack, msgpackSimplify, mgspackUnpack (#4)

  • New vignette (#4)

  • New tests (#4)

Courtesy of CRANberries, there is also a diffstat report for this release. More information is on the RcppRedis page.

More information may be on the RcppMsgPack page. Issues and bugreports should go to the GitHub issue tracker.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

RcppRedis 0.1.8

A new minor release of RcppRedis arrived on CRAN last week, following the release 0.2.0 of RcppMsgPack which brought the MsgPack headers forward to release 2.1.5. This required a minor and rather trivial change in the code. When the optional RcppMsgPack package is used, we now require this version 0.2.0 or later.

We made a few internal updates to the package as well.

Changes in version 0.1.8 (2017-09-08)

  • A new file init.c was added with calls to R_registerRoutines() and R_useDynamicSymbols()

  • Symbol registration is enabled in useDynLib

  • Travis CI was updated to using run.sh

  • The (optional MessagePack) code was updated for MsgPack 2.*

Courtesy of CRANberries, there is also a diffstat report for this release. More information is on the RcppRedis page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Thu, 31 Aug 2017

RcppAnnoy 0.0.9

An new version 0.0.9 of RcppAnnoy, our Rcpp-based R integration of the nifty Annoy library by Erik, is now on CRAN. Annoy is a small and lightweight C++ template header library for very fast approximate nearest neighbours.

This release corrects an issue for Windows users discovered by GitHub user 'khoran' who later also suggested the fix of binary mode. It upgrades to Annoy release 1.9.1 and brings its new Manhattan distance to RcppAnnoy. A number of unit tests were added as well, and we updated some packaging internals such as symbol registration.

And I presume I had a good streak emailing with Uwe's robots as the package made it onto CRAN rather smoothly within ten minutes of submission:

RcppAnnou to CRAN 

Changes in this version are summarized here:

Changes in version 0.0.9 (2017-08-31)

  • Synchronized with Annoy upstream version 1.9.1

  • Minor updates in calls and tests as required by annoy 1.9.1

  • New Manhattan distance modules along with unit test code

  • Additional unit tests from upstream test code carried over

  • Binary mode is used for save (as suggested by @khoran in #21)

  • A new file init.c was added with calls to R_registerRoutines() and R_useDynamicSymbols()

  • Symbol registration is enabled in useDynLib

Courtesy of CRANberries, there is also a diffstat report for this release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Tue, 29 Aug 2017

RcppArmadillo 0.7.960.1.2

armadillo image

A second fix-up release is needed following on the recent bi-monthly RcppArmadillo release as well as the initial follow-up as it turns out that OS X / macOS is so darn special that it needs an entire separate treatment for OpenMP. Namely to turn it off entirely...

Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language--and is widely used by (currently) 384 other packages on CRAN---an increase of 54 since the CRAN release in June!

Changes in RcppArmadillo version 0.7.960.1.2 (2017-08-29)

  • On macOS, OpenMP support is now turned off (#170).

  • The package is now compiling under the C++11 standard (#170).

  • The vignette dependency is correctly set (James and Dirk in #168 and #169)

Courtesy of CRANberries, there is a diffstat report. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Mon, 28 Aug 2017

RcppSMC 0.2.0

A new version 0.2.0 of the RcppSMC package arrived on CRAN earlier today (as a very quick pretest-publish within minutes of submission).

RcppSMC provides Rcpp-based bindings to R for the Sequential Monte Carlo Template Classes (SMCTC) by Adam Johansen described in his JSS article.

This release 0.2.0 is chiefly the work of Leah South, a Ph.D. student at Queensland University of Technology, who was during the last few months a Google Summer of Code student mentored by Adam and myself. It was pleasure to work with Leah on this, and see her progress. Our congratulations to Leah for a job well done!

Changes in RcppSMC version 0.2.0 (2017-08-28)

  • Also use .registration=TRUE in useDynLib in NAMESPACE

  • Multiple Sequential Monte Carlo extensions (Leah South as part of Google Summer of Code 2017)

    • Switching to population level objects (#2 and #3).

    • Using Rcpp attributes (#2).

    • Using automatic RNGscope (#4 and #5).

    • Adding multiple normalising constant estimators (#7).

    • Static Bayesian model example: linear regression (#10 addressing #9).

    • Adding a PMMH example (#13 addressing #11).

    • Framework for additional algorithm parameters and adaptation (#19 addressing #16; also #24 addressing #23).

    • Common adaptation methods for static Bayesian models (#20 addressing #17).

    • Supporting MCMC repeated runs (#21).

    • Adding adaptation to linear regression example (#22 addressing #18).

Courtesy of CRANberries, there is a diffstat report for this release.

More information is on the RcppSMC page. Issues and bugreports should go to the GitHub issue tracker.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Thu, 24 Aug 2017

BH 1.65.0-1

The BH package on CRAN was updated today to version 1.65.0. BH provides a sizeable portion of the Boost C++ libraries as a set of template headers for use by R, possibly with Rcpp as well as other packages.

This release upgrades the version of Boost to the rather new upstream version Boost 1.65.0 released earlier this week, and adds two new libraries: align and sort.

I had started the upgrade process a few days ago under release 1.64.0. Rigorous checking of reverse dependencies showed that mvnfast needed a small change (which was trivial: just seeding the RNG prior to running tests), which Matteo did in no time with a fresh CRAN upload. rstan is needing a bit more work but should be ready real soon now and we are awaiting a new version. And once I switched to the just release Boost 1.65.0 it became apparent that Cyclops no longer needs its embedded copy of Boost iterator---and Marc already made that change with yet another fresh CRAN upload. It is a true pleasure to work in such a responsive and collaborative community.

Changes in version 1.65.0-1 (2017-08-24)

  • Upgraded to Boost 1.64 and then 1.65 installed directly from upstream source with several minor tweaks (as before)

  • Fourth tweak corrects a misplaced curly brace (see the Boost ublas GitHub repo and its issue #40)

  • Added Boost align (as requested in #32)

  • Added Boost sort (as requested in #35)

  • Added Boost multiprecision by fixing a script typo (as requested in #42)

  • Updated Travis CI support via newer run.sh

Via CRANberries, there is a diffstat report relative to the previous release.

Comments and suggestions are welcome via the mailing list or the issue tracker at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/bh | permanent link

Wed, 23 Aug 2017

Rcpp now used by 10 percent of CRAN packages

10 percent of CRAN packages

Over the last few days, Rcpp passed another noteworthy hurdle. It is now used by over 10 percent of packages on CRAN (as measured by Depends, Imports and LinkingTo, but excluding Suggests). As of this morning 1130 packages use Rcpp out of a total of 11275 packages. The graph on the left shows the growth of both outright usage numbers (in darker blue, left axis) and relative usage (in lighter blue, right axis).

Older posts on this blog took note when Rcpp passed round hundreds of packages, most recently in April for 1000 packages. The growth rates for both Rcpp, and of course CRAN, are still staggering. A big thank you to everybody who makes this happen, from R Core and CRAN to all package developers, contributors, and of course all users driving this. We have built ourselves a rather impressive ecosystem.

So with that a heartfelt Thank You! to all users and contributors of R, CRAN, and of course Rcpp, for help, suggestions, bug reports, documentation, encouragement, and, of course, code.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Sun, 20 Aug 2017

RcppArmadillo 0.7.960.1.1

armadillo image

On the heels of the very recent bi-monthly RcppArmadillo release comes a quick bug-fix release 0.7.960.1.1 which just got onto CRAN (and I will ship a build to Debian in a moment).

There were three distinct issues I addressed in three quick pull requests:

  • The excellent Google Summer of Code work by Binxiang Ni had only encountered direct use of sparse matrices as produced by the Matrix. However, while we waited for 0.7.960.1.0 to make it onto CRAN, the quanteda package switched to derived classes---which we now account for via the is() method of our S4 class. Thanks to Kevin Ushey for reminding me we had is().
  • We somehow missed to account for the R 3.4.* and Rcpp 0.12.{11,12} changes for package registration (with .registration=TRUE), so ensured we only have one fastLm symbol.
  • The build did not take not too well to systems without OpenMP, so we now explicitly unset supported via an Armadillo configuration variable. In general, client packages probably want to enable C++11 support when using OpenMP (explicitly) but we prefer to not upset too many (old) users. However, our configure check now also wants g++ 4.7.2 or later just like Armadillo.

Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language--and is widely used by (currently) 382 other packages on CRAN---an increase of 52 since the CRAN release in June!

Changes in this release relative to the previous CRAN release are as follows:

Changes in RcppArmadillo version 0.7.960.1.1 (2017-08-20)

  • Added improved check for inherited S4 matrix classes (#162 fixing #161)

  • Changed fastLm C++ function to fastLm_impl to not clash with R method (#164 fixing #163)

  • Added OpenMP check for configure (#166 fixing #165)

Courtesy of CRANberries, there is a diffstat report. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

#10: Compacting your Shared Libraries, After The Build

Welcome to the tenth post in the rarely ranting R recommendations series, or R4 for short. A few days ago we showed how to tell the linker to strip shared libraries. As discussed in the post, there are two options. One can either set up ~/.R/Makevars by passing the strip-debug option to the linker. Alternatively, one can adjust src/Makevars in the package itself with a bit a Makefile magic.

Of course, there is a third way: just run strip --strip-debug over all the shared libraries after the build. As the path is standardized, and the shell does proper globbing, we can just do

$ strip --strip-debug /usr/local/lib/R/site-library/*/libs/*.so

using a double-wildcard to get all packages (in that R package directory) and all their shared libraries. Users on macOS probably want .dylib on the end, users on Windows want another computer as usual (just kidding: use .dll). Either may have to adjust the path which is left as an exercise to the reader.

The impact can be Yuge as illustrated in the following dotplot:

This illustration is in response to a mailing list post. Last week, someone claimed on r-help that tidyverse would not install on Ubuntu 17.04. And this is of course patently false as many of us build and test on Ubuntu and related Linux systems, Travis runs on it, CRAN tests them etc pp. That poor user had somehow messed up their default gcc version. Anyway: I fired up a Docker container, installed r-base-core plus three required -dev packages (for xml2, openssl, and curl) and ran a single install.packages("tidyverse"). In a nutshell, following the launch of Docker for an Ubuntu 17.04 container, it was just

$ apt-get update
$ apt-get install r-base libcurl4-openssl-dev libssl-dev libxml2-dev
$ apt-get install mg          # a tiny editor
$ mg /etc/R/Rprofile.site     # to add a default CRAN repo
$ R -e 'install.packages("tidyverse")'

which not only worked (as expected) but also installed a whopping fifty-one packages (!!) of which twenty-six contain a shared library. A useful little trick is to run du with proper options to total, summarize, and use human units which reveals that these libraries occupy seventy-eight megabytes:

root@de443801b3fc:/# du -csh /usr/local/lib/R/site-library/*/libs/*so
4.3M    /usr/local/lib/R/site-library/Rcpp/libs/Rcpp.so
2.3M    /usr/local/lib/R/site-library/bindrcpp/libs/bindrcpp.so
144K    /usr/local/lib/R/site-library/colorspace/libs/colorspace.so
204K    /usr/local/lib/R/site-library/curl/libs/curl.so
328K    /usr/local/lib/R/site-library/digest/libs/digest.so
33M     /usr/local/lib/R/site-library/dplyr/libs/dplyr.so
36K     /usr/local/lib/R/site-library/glue/libs/glue.so
3.2M    /usr/local/lib/R/site-library/haven/libs/haven.so
272K    /usr/local/lib/R/site-library/jsonlite/libs/jsonlite.so
52K     /usr/local/lib/R/site-library/lazyeval/libs/lazyeval.so
64K     /usr/local/lib/R/site-library/lubridate/libs/lubridate.so
16K     /usr/local/lib/R/site-library/mime/libs/mime.so
124K    /usr/local/lib/R/site-library/mnormt/libs/mnormt.so
372K    /usr/local/lib/R/site-library/openssl/libs/openssl.so
772K    /usr/local/lib/R/site-library/plyr/libs/plyr.so
92K     /usr/local/lib/R/site-library/purrr/libs/purrr.so
13M     /usr/local/lib/R/site-library/readr/libs/readr.so
4.7M    /usr/local/lib/R/site-library/readxl/libs/readxl.so
1.2M    /usr/local/lib/R/site-library/reshape2/libs/reshape2.so
160K    /usr/local/lib/R/site-library/rlang/libs/rlang.so
928K    /usr/local/lib/R/site-library/scales/libs/scales.so
4.9M    /usr/local/lib/R/site-library/stringi/libs/stringi.so
1.3M    /usr/local/lib/R/site-library/tibble/libs/tibble.so
2.0M    /usr/local/lib/R/site-library/tidyr/libs/tidyr.so
1.2M    /usr/local/lib/R/site-library/tidyselect/libs/tidyselect.so
4.7M    /usr/local/lib/R/site-library/xml2/libs/xml2.so
78M     total
root@de443801b3fc:/# 

Looks like dplyr wins this one at thirty-three megabytes just for its shared library.

But with a single stroke of strip we can reduce all this down a lot:

root@de443801b3fc:/# strip --strip-debug /usr/local/lib/R/site-library/*/libs/*so
root@de443801b3fc:/# du -csh /usr/local/lib/R/site-library/*/libs/*so
440K    /usr/local/lib/R/site-library/Rcpp/libs/Rcpp.so
220K    /usr/local/lib/R/site-library/bindrcpp/libs/bindrcpp.so
52K     /usr/local/lib/R/site-library/colorspace/libs/colorspace.so
56K     /usr/local/lib/R/site-library/curl/libs/curl.so
120K    /usr/local/lib/R/site-library/digest/libs/digest.so
2.5M    /usr/local/lib/R/site-library/dplyr/libs/dplyr.so
16K     /usr/local/lib/R/site-library/glue/libs/glue.so
404K    /usr/local/lib/R/site-library/haven/libs/haven.so
76K     /usr/local/lib/R/site-library/jsonlite/libs/jsonlite.so
20K     /usr/local/lib/R/site-library/lazyeval/libs/lazyeval.so
24K     /usr/local/lib/R/site-library/lubridate/libs/lubridate.so
8.0K    /usr/local/lib/R/site-library/mime/libs/mime.so
52K     /usr/local/lib/R/site-library/mnormt/libs/mnormt.so
84K     /usr/local/lib/R/site-library/openssl/libs/openssl.so
76K     /usr/local/lib/R/site-library/plyr/libs/plyr.so
32K     /usr/local/lib/R/site-library/purrr/libs/purrr.so
648K    /usr/local/lib/R/site-library/readr/libs/readr.so
400K    /usr/local/lib/R/site-library/readxl/libs/readxl.so
128K    /usr/local/lib/R/site-library/reshape2/libs/reshape2.so
56K     /usr/local/lib/R/site-library/rlang/libs/rlang.so
100K    /usr/local/lib/R/site-library/scales/libs/scales.so
496K    /usr/local/lib/R/site-library/stringi/libs/stringi.so
124K    /usr/local/lib/R/site-library/tibble/libs/tibble.so
164K    /usr/local/lib/R/site-library/tidyr/libs/tidyr.so
104K    /usr/local/lib/R/site-library/tidyselect/libs/tidyselect.so
344K    /usr/local/lib/R/site-library/xml2/libs/xml2.so
6.6M    total
root@de443801b3fc:/#

Down to six point six megabytes. Not bad for one command. The chart visualizes the respective reductions. Clearly, C++ packages (and their template use) lead to more debugging symbols than plain old C code. But once stripped, the size differences are not that large.

And just to be plain, what we showed previously in post #9 does the same, only already at installation stage. The effects are not cumulative.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Thu, 17 Aug 2017

RcppArmadillo 0.7.960.1.0

armadillo image

The bi-monthly RcppArmadillo release is out with a new version 0.7.960.1.0 which is now on CRAN, and will get to Debian in due course.

And it is a big one. Lots of nice upstream changes from Armadillo, and lots of work on our end as the Google Summer of Code project by Binxiang Ni, plus a few smaller enhancements -- see below for details.

Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language--and is widely used by (currently) 379 other packages on CRAN---an increase of 49 since the last CRAN release in June!

Changes in this release relative to the previous CRAN release are as follows:

Changes in RcppArmadillo version 0.7.960.1.0 (2017-08-11)

  • Upgraded to Armadillo release 7.960.1 (Northern Banana Republic Deluxe)

    • faster randn() when using OpenMP (NB: usually omitted when used fromR)

    • faster gmm_diag class, for Gaussian mixture models with diagonal covariance matrices

    • added .sum_log_p() to the gmm_diag class

    • added gmm_full class, for Gaussian mixture models with full covariance matrices

    • expanded .each_slice() to optionally use OpenMP for multi-threaded execution

  • Upgraded to Armadillo release 7.950.0 (Northern Banana Republic)

    • expanded accu() and sum() to use OpenMP for processing expressions with computationally expensive element-wise functions

    • expanded trimatu() and trimatl() to allow specification of the diagonal which delineates the boundary of the triangular part

  • Enhanced support for sparse matrices (Binxiang Ni as part of Google Summer of Code 2017)

    • Add support for dtCMatrix and dsCMatrix (#135)

    • Add conversion and unit tests for dgT, dtT and dsTMatrix (#136)

    • Add conversion and unit tests for dgR, dtR and dsRMatrix (#139)

    • Add conversion and unit tests for pMatrix and ddiMatrix (#140)

    • Rewrite conversion for dgT, dtT and dsTMatrix, and add file-based tests (#142)

    • Add conversion and unit tests for indMatrix (#144)

    • Rewrite conversion for ddiMatrix (#145)

    • Add a warning message for matrices that cannot be converted (#147)

    • Add new vignette for sparse matrix support (#152; Dirk in #153)

    • Add support for sparse matrix conversion from Python SciPy (#158 addressing #141)

  • Optional return of row or column vectors in collapsed form if appropriate #define is set (Serguei Sokol in #151 and #154)

  • Correct speye() for non-symmetric cases (Qiang Kou in #150 closing #149).

  • Ensure tests using Scientific Python and reticulate are properly conditioned on the packages being present.

  • Added .aspell/ directory with small local directory now supported by R-devel.

Courtesy of CRANberries, there is a diffstat report. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Mon, 14 Aug 2017

#9: Compacting your Shared Libraries

Welcome to the nineth post in the recognisably rancid R randomness series, or R4 for short. Following on the heels of last week's post, we aim to look into the shared libraries created by R.

We love the R build process. It is robust, cross-platform, reliable and rather predicatable. It. Just. Works.

One minor issue, though, which has come up once or twice in the past is the (in)ability to fully control all compilation options. R will always recall CFLAGS, CXXFLAGS, ... etc as used when it was compiled. Which often entails the -g flag for debugging which can seriously inflate the size of the generated object code. And once stored in ${RHOME}/etc/Makeconf we cannot on the fly override these values.

But there is always a way. Sometimes even two.

The first is local and can be used via the (personal) ~/.R/Makevars file (about which I will have to say more in another post). But something I have been using quite a bite lately uses the flags for the shared library linker. Given that we can have different code flavours and compilation choices---between C, Fortran and the different C++ standards---one can end up with a few lines. I currently use this which uses -Wl, to pass an the -S (or --strip-debug) option to the linker (and also reiterates the desire for a shared library, presumably superfluous):

SHLIB_CXXLDFLAGS = -Wl,-S -shared
SHLIB_CXX11LDFLAGS = -Wl,-S -shared
SHLIB_CXX14LDFLAGS = -Wl,-S -shared
SHLIB_FCLDFLAGS = -Wl,-S -shared
SHLIB_LDFLAGS = -Wl,-S -shared

Let's consider an example: my most recently uploaded package RProtoBuf. Built under a standard 64-bit Linux setup (Ubuntu 17.04, g++ 6.3) and not using the above, we end up with library containing 12 megabytes (!!) of object code:

edd@brad:~/git/rprotobuf(feature/fewer_warnings)$ ls -lh src/RProtoBuf.so
-rwxr-xr-x 1 edd edd 12M Aug 14 20:22 src/RProtoBuf.so
edd@brad:~/git/rprotobuf(feature/fewer_warnings)$ 

However, if we use the flags shown above in .R/Makevars, we end up with much less:

edd@brad:~/git/rprotobuf(feature/fewer_warnings)$ ls -lh src/RProtoBuf.so 
-rwxr-xr-x 1 edd edd 626K Aug 14 20:29 src/RProtoBuf.so
edd@brad:~/git/rprotobuf(feature/fewer_warnings)$ 

So we reduced the size from 12mb to 0.6mb, an 18-fold decrease. And the file tool still shows the file as 'not stripped' as it still contains the symbols. Only debugging information was removed.

What reduction in size can one expect, generally speaking? I have seen substantial reductions for C++ code, particularly when using tenmplated code. More old-fashioned C code will be less affected. It seems a little difficult to tell---but this method is my new build default as I continually find rather substantial reductions in size (as I tend to work mostly with C++-based packages).

The second option only occured to me this evening, and complements the first which is after all only applicable locally via the ~/.R/Makevars file. What if we wanted it affect each installation of a package? The following addition to its src/Makevars should do:

strippedLib: $(SHLIB)
        if test -e "/usr/bin/strip"; then /usr/bin/strip --strip-debug $(SHLIB); fi

.phony: strippedLib

We declare a new Makefile target strippedLib. But making it dependent on $(SHLIB), we ensure the standard target of this Makefile is built. And by making the target .phony we ensure it will always be executed. And it simply tests for the strip tool, and invokes it on the library after it has been built. Needless to say we get the same reduction is size. And this scheme may even pass muster with CRAN, but I have not yet tried.

Lastly, and acknowledgement. Everything in this post has benefited from discussion with my former colleague Dan Dillon who went as far as setting up tooling in his r-stripper repository. What we have here may be simpler, but it would not have happened with what Dan had put together earlier.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Sun, 13 Aug 2017

RProtoBuf 0.4.10

RProtoBuf provides R bindings for the Google Protocol Buffers ("ProtoBuf") data encoding and serialization library used and released by Google, and deployed fairly widely in numerous projects as a language and operating-system agnostic protocol.

A new releases RProtoBuf 0.4.10 just appeared on CRAN. It is a maintenance releases replacing one leftover errorenous use of package= in .Call with the correct PACKAGE= (as requsted by CRAN). It also integrates a small robustification in the deserializer when encountering invalide objects; this was both reported and fixed by Jeffrey Shen.

Changes in RProtoBuf version 0.4.10 (2017-08-13)

  • More careful operation in deserializer checking for a valid class attribute (Jeffrey Shen in #29 fixing #28)

  • At the request of CRAN, correct one .Call() argument to PACKAGE=; update routine registration accordingly

CRANberries also provides a diff to the previous release. The RProtoBuf page has an older package vignette, a 'quick' overview vignette, a unit test summary vignette, and the pre-print for the JSS paper. Questions, comments etc should go to the GitHub issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rprotobuf | permanent link

Thu, 10 Aug 2017

#8: Customizing Spell Checks for R CMD check

Welcome to the eight post in the ramblingly random R rants series, or R4 for short. We took a short break over the last few weeks due to some conferencing followed by some vacationing and general chill.

But we're back now, and this post gets us back to initial spirit of (hopefully) quick and useful posts. Perusing yesterday's batch of CRANberries posts, I noticed a peculiar new directory shown the in the diffstat output we use to compare two subsequent source tarballs. It was entitled .aspell/, in the top-level directory, and in two new packages by R Core member Kurt Hornik himself.

The context is, of course, the not infrequently-expressed desire to customize the spell checking done on CRAN incoming packages, see e.g. this r-package-devel thread.

And now we can as I verified with (the upcoming next release of) RcppArmadillo, along with a recent-enough (i.e. last few days) version of r-devel. Just copying what Kurt did, i.e. adding a file .aspell/defaults.R, and in it pointing to rds file (named as the package) containing a character vector with words added to the spell checker's universe is all it takes. For my package, see here for the peculiars.

Or see here:

edd@bud:~/git/rcpparmadillo/.aspell(master)$ cat defaults.R 
Rd_files <- vignettes <- R_files <- description <-
    list(encoding = "UTF-8",
         language = "en",
         dictionaries = c("en_stats", "RcppArmadillo"))
edd@bud:~/git/rcpparmadillo/.aspell(master)$ r -p -e 'readRDS("RcppArmadillo.rds")'
[1] "MPL"            "Sanderson"      "Templated"
[4] "decompositions" "onwards"        "templated"
edd@bud:~/git/rcpparmadillo/.aspell(master)$     

And now R(-devel) CMD check --as-cran ... is silent about spelling. Yay!

But take this with a grain of salt as this does not yet seem to be "announced" as e.g. yesterday's change in the CRAN Policy did not mention it. So things may well change -- but hey, it worked for me.

And this all is about aspell, here is something topical about a spell to close the post:

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/r4 | permanent link

Thu, 03 Aug 2017

R for System Adminstration

Just getting back from the most fun meetup I have been to in quite some time: episode 23 (by their count) of Open Source Open Mic hosted by Matt Godbolt and Joe Walnes here in Chicago. Nothing but a sequence of lightning talks. Plus beer and pizza. Sounds awesome? It was!

We had fantastic talks across at least half a dozen languages, covering both new-ish (Pony) and interesting ones such (Rust, Go, ...) plus of course some Javascript and some Python, no Java (yay!) and a few batshit crazy things like a self-hosting database in its own (shell) code, a terminal gif viewer (!!), and more. And it gave me an opportunity to quickly (one evening and morning commute) jam out a presentation about what is in the title: R for system administration.

And I am only half-joking. I had used R a couple of years ago when I needed to select, subset, modify, ... a large number of image files given some timestamp and filename patterns. And given how well R works in a vectorised manner with both regular expressions and timestamps, as well as on top of essentially all standard POSIX-style operating system / file-system functions, I picked up that thread again on the problem of ... cleaning up the file storage underlying CRANberries which by now has well over fifty-seven thousand (!!) tarballs of CRAN packages based on now ten years of CRANberries. So I showed how to prune this in essentially half a dozen lines of R (and data.table code), plus some motivation---all just right for a lightning talk. Seemingly the talk went well enough as quite a few folks gave a thumbs up and compliments over beers afterwards.

But see for yourself as the slides are now uploaded to my standard talks page.

My thanks to Matt and Joe for organizing the meetup. I think I will be back.

/code/snippets | permanent link

Sat, 29 Jul 2017

Updated overbought/oversold plot function

A good six years ago I blogged about plotOBOS() which charts a moving average (from one of several available variants) along with shaded standard deviation bands. That post has a bit more background on the why/how and motivation, but as a teaser here is the resulting chart of the SP500 index (with ticker ^GSCP):

Example chart of overbought/oversold levels from plotOBOS() function 

The code uses a few standard finance packages for R (with most of them maintained by Joshua Ulrich given that Jeff Ryan, who co-wrote chunks of these, is effectively retired from public life). Among these, xts had a recent release reflecting changes which occurred during the four (!!) years since the previous release, and covering at least two GSoC projects. With that came subtle API changes: something we all generally try to avoid but which is at times the only way forward. In this case, the shading code I used (via polygon() from base R) no longer cooperated with the beefed-up functionality of plot.xts(). Luckily, Ross Bennett incorporated that same functionality into a new function addPolygon --- which even credits this same post of mine.

With that, the updated code becomes

## plotOBOS -- displaying overbough/oversold as eg in Bespoke's plots
##
## Copyright (C) 2010 - 2017  Dirk Eddelbuettel
##
## This is free software: you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 2 of the License, or
## (at your option) any later version.

suppressMessages(library(quantmod))     # for getSymbols(), brings in xts too
suppressMessages(library(TTR))          # for various moving averages

plotOBOS <- function(symbol, n=50, type=c("sma", "ema", "zlema"),
                     years=1, blue=TRUE, current=TRUE, title=symbol,
                     ticks=TRUE, axes=TRUE) {

    today <- Sys.Date()
    if (class(symbol) == "character") {
        X <- getSymbols(symbol, from=format(today-365*years-2*n), auto.assign=FALSE)
        x <- X[,6]                          # use Adjusted
    } else if (inherits(symbol, "zoo")) {
        x <- X <- as.xts(symbol)
        current <- FALSE                # don't expand the supplied data
    }

    n <- min(nrow(x)/3, 50)             # as we may not have 50 days

    sub <- ""
    if (current) {
        xx <- getQuote(symbol)
        xt <- xts(xx$Last, order.by=as.Date(xx$`Trade Time`))
        colnames(xt) <- paste(symbol, "Adjusted", sep=".")
        x <- rbind(x, xt)
        sub <- paste("Last price: ", xx$Last, " at ",
                     format(as.POSIXct(xx$`Trade Time`), "%H:%M"), sep="")
    }

    type <- match.arg(type)
    xd <- switch(type,                  # compute xd as the central location via selected MA smoother
                 sma = SMA(x,n),
                 ema = EMA(x,n),
                 zlema = ZLEMA(x,n))
    xv <- runSD(x, n)                   # compute xv as the rolling volatility

    strt <- paste(format(today-365*years), "::", sep="")
    x  <- x[strt]                       # subset plotting range using xts' nice functionality
    xd <- xd[strt]
    xv <- xv[strt]

    xyd <- xy.coords(.index(xd),xd[,1]) # xy coordinates for direct plot commands
    xyv <- xy.coords(.index(xv),xv[,1])

    n <- length(xyd$x)
    xx <- xyd$x[c(1,1:n,n:1)]           # for polygon(): from first point to last and back

    if (blue) {
        blues5 <- c("#EFF3FF", "#BDD7E7", "#6BAED6", "#3182BD", "#08519C") # cf brewer.pal(5, "Blues")
        fairlylight <<- rgb(189/255, 215/255, 231/255, alpha=0.625) # aka blues5[2]
        verylight <<- rgb(239/255, 243/255, 255/255, alpha=0.625) # aka blues5[1]
        dark <<- rgb(8/255, 81/255, 156/255, alpha=0.625) # aka blues5[5]
        ## buglet in xts 0.10-0 requires the <<- here
    } else {
        fairlylight <<- rgb(204/255, 204/255, 204/255, alpha=0.5)  # two suitable grays, alpha-blending at 50%
        verylight <<- rgb(242/255, 242/255, 242/255, alpha=0.5)
        dark <<- 'black'
    }

    plot(x, ylim=range(range(x, xd+2*xv, xd-2*xv, na.rm=TRUE)), main=title, sub=sub, 
         major.ticks=ticks, minor.ticks=ticks, axes=axes) # basic xts plot setup
    addPolygon(xts(cbind(xyd$y+xyv$y, xyd$y+2*xyv$y), order.by=index(x)), on=1, col=fairlylight)  # upper
    addPolygon(xts(cbind(xyd$y-xyv$y, xyd$y+1*xyv$y), order.by=index(x)), on=1, col=verylight)    # center
    addPolygon(xts(cbind(xyd$y-xyv$y, xyd$y-2*xyv$y), order.by=index(x)), on=1, col=fairlylight)  # lower
    lines(xd, lwd=2, col=fairlylight)   # central smooted location
    lines(x, lwd=3, col=dark)           # actual price, thicker
}

and the main change are the three calls to addPolygon. To illustrate, we call plotOBOS("SPY", years=2) with an updated plot of the ETF representing the SP500 over the last two years:

Updated example chart of overbought/oversold levels from plotOBOS() function 

Comments and further enhancements welcome!

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/snippets | permanent link

Wed, 19 Jul 2017

RcppAPT 0.0.4

A new version of RcppAPT -- our interface from R to the C++ library behind the awesome apt, apt-get, apt-cache, ... commands and their cache powering Debian, Ubuntu and the like -- arrived on CRAN yesterday.

We added a few more functions in order to compute on the package graph. A concrete example is shown in this vignette which determines the (minimal) set of remaining Debian packages requiring a rebuild under R 3.4.* to update their .C() and .Fortran() registration code. It has been used for the binNMU request #868558.

As we also added a NEWS file, its (complete) content covering all releases follows below.

Changes in version 0.0.4 (2017-07-16)

  • New function getDepends

  • New function reverseDepends

  • Added package registration code

  • Added usage examples in scripts directory

  • Added vignette, also in docs as rendered copy

Changes in version 0.0.3 (2016-12-07)

  • Added dumpPackages, showSrc

Changes in version 0.0.2 (2016-04-04)

  • Added reverseDepends, dumpPackages, showSrc

Changes in version 0.0.1 (2015-02-20)

  • Initial version with getPackages and hasPackages

A bit more information about the package is available here as well as as the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Sat, 15 Jul 2017

Rcpp 0.12.12: Rounding some corners

The twelveth update in the 0.12.* series of Rcpp landed on CRAN this morning, following two days of testing at CRAN preceded by five full reverse-depends checks we did (and which are always logged in this GitHub repo). The Debian package has been built and uploaded; Windows and macOS binaries should follow at CRAN as usual. This 0.12.12 release follows the 0.12.0 release from late July, the 0.12.1 release in September, the 0.12.2 release in November, the 0.12.3 release in January, the 0.12.4 release in March, the 0.12.5 release in May, the 0.12.6 release in July, the 0.12.7 release in September, the 0.12.8 release in November, the 0.12.9 release in January, the 0.12.10.release in March, and the 0.12.11.release in May making it the sixteenth release at the steady and predictable bi-montly release frequency.

Rcpp has become the most popular way of enhancing GNU R with C or C++ code. As of today, 1097 packages (and hence 71 more since the last release in May) on CRAN depend on Rcpp for making analytical code go faster and further, along with another 91 in BioConductor.

This releases contain a fairly large number of fairly small and focused pull requests most of which either correct some corner cases or improve other aspects. JJ tirelessly improved the package registration added in the previous release and following R 3.4.0. Kirill tidied up a number of small issues allowing us to run compilation in even more verbose modes---usually a good thing. Jeroen, Elias Pipping and Yo Gong all contributed as well, and we thank everybody for their contributions.

All changes are listed below in some detail.

Changes in Rcpp version 0.12.12 (2017-07-13)

  • Changes in Rcpp API:

    • The tinyformat.h header now ends in a newline (#701).

    • Fixed rare protection error that occurred when fetching stack traces during the construction of an Rcpp exception (Kirill Müller in #706).

    • Compilation is now also possibly on Haiku-OS (Yo Gong in #708 addressing #707).

    • Dimension attributes are explicitly cast to int (Kirill Müller in #715).

    • Unused arguments are no longer declared (Kirill Müller in #716).

    • Visibility of exported functions is now supported via the R macro atttribute_visible (Jeroen Ooms in #720).

    • The no_init() constructor accepts R_xlen_t (Kirill Müller in #730).

    • Loop unrolling used R_xlen_t (Kirill Müller in #731).

    • Two unused-variables warnings are now avoided (Jeff Pollock in #732).

  • Changes in Rcpp Attributes:

    • Execute tools::package_native_routine_registration_skeleton within package rather than current working directory (JJ in #697).

    • The R portion no longer uses dir.exists to no require R 3.2.0 or newer (Elias Pipping in #698).

    • Fix native registration for exports with name attribute (JJ in #703 addressing #702).

    • Automatically register init functions for Rcpp Modules (JJ in #705 addressing #704).

    • Add Shield around parameters in Rcpp::interfaces (JJ in #713 addressing #712).

    • Replace dot (".") with underscore ("_") in package names when generating native routine registrations (JJ in #722 addressing #721).

    • Generate C++ native routines with underscore ("_") prefix to avoid exporting when standard exportPattern is used in NAMESPACE (JJ in #725 addressing #723).

Thanks to CRANberries, you can also look at a diff to the previous release. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads page, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/rcpp | permanent link

Thu, 22 Jun 2017

nanotime 0.2.0

A new version of the nanotime package for working with nanosecond timestamps just arrived on CRAN.

nanotime uses the RcppCCTZ package for (efficient) high(er) resolution time parsing and formatting up to nanosecond resolution, and the bit64 package for the actual integer64 arithmetic.

Thanks to a metric ton of work by Leonardo Silvestri, the package now uses S4 classes internally allowing for greater consistency of operations on nanotime objects.

Changes in version 0.2.0 (2017-06-22)

  • Rewritten in S4 to provide more robust operations (#17 by Leonardo)

  • Ensure tz="" is treated as unset (Leonardo in #20)

  • Added format and tz arguments to nanotime, format, print (#22 by Leonardo and Dirk)

  • Ensure printing respect options()$max.print, ensure names are kept with vector (#23 by Leonardo)

  • Correct summary() by defining names<- (Leonardo in #25 fixing #24)

  • Report error on operations that are meaningful for type; handled NA, NaN, Inf, -Inf correctly (Leonardo in #27 fixing #26)

We also have a diff to the previous version thanks to CRANberries. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

/code/nanotime | permanent link