sanitizers: Testing R packages for ASAN and UBSAN errors

Overview

Recent gcc and clang compiler versions provide functionality to test for memory violations and other undefined behaviour; this is often referred to as “Address Sanitizer” (or ASAN) and “Undefined Behaviour Sanitizer” (UBSAN). The Writing R Extension manual describes this in some detail in Section 4.3 titled Checking Memory Access.

kitten

This feature has to be enabled in the corresponding binary, eg in R, which is somewhat involved as it also required a current compiler toolchain which is not yet widely available, or in the case of Windows, not available at all (via the common Rtools mechanism).

As an easier alternative, the pre-built Docker containers available via my Docker Hub repository can be used on Linux, and via boot2docker on Windows and OS X.

The sanitizers package is an R package which provides a means of testing the compiler setup as the known code failures provides in the sample code here should be detected correctly, whereas a default build of R will let the package pass.

The code samples are based on the examples from the Address Sanitizer Wiki.

Docker Overview

Docker uses Linux Containers (LXC, see e.g. the LXC Wikipedia page for details) to provide an extremely lightweight and convenient ability to run code in different environments.

This may sound like virtualization, but it is not. Virtualization needs to emulate or otherwise provide an entire OS layer. Containers, on the other hand, are more like a mere additional process and impose very little additional burden.

Docker is currently very much en vogue and changing how software is developed, tested an deployed. Just like the container shipping revolution changed how goods are shipped: used standardized, interoperable contains that could be deployed anywhere. Docker aims for something similar with software: use anyway, irrespective of the content, thanks to a standard interface. At the summary page of the mailing list, this is helpfully summarized as follows: * Standard operations * Content agnostic * Infrastructure agnostic * Designed for automation * Industrial-grade delivery all which aligns nicely with the real-world comparison of shipping containers. Which happen to have revolutionized global trade.

Another nice docker feature Docker is the use of AUFS. Effectively, container images are unions of different file system “diffs” which are shared between containers. This makes more efficient disk use than, say, images of common virtualization systems.

As I showed in the last part of my useR! 2014 keynote, Docker images can be very useful to test R packages.

Docker Installation

Docker works just about everywhere. Please consult the (very decent) Docker documentation for details concerning your particular OS.

I use Docker on Debian testing and Ubuntu 14.04 where it works out of the box. Two recommendations:

cd /usr/local/bin && sudo ln -s /usr/bin/docker.io docker
sudo addgroup youruserhandle docker

The first command allows you to invoke Docker as docker (which the excellent package can’t provide as Debian / Ubuntu already had a GUI launcher binary of the same name). The second command adds your user id to the docker group which means you do not to run everything as sudo.

Windows and OS X users can run Docker via the boot2docker application. It seamlessly installs a virtualization layer, a minimal Linux system (for Docker to run on) and the git tool suite (to access Docker container images). Its installation is very streamlined and well documented; see the Docker documentation.

Docker Images for R package testing

As of summer 2014, I maintain two GitHub repositories with Dockerfiles:

each of which contains a base image with just R (add-r), an extended image with R and as well as R-devel (add-r-devel) and either one or two more images with pre-built SAN/UBSAN testing images. These are the ones we use here.

The Docker Hub build service is connected to GitHub and provides the two repositories:

Once you have Docker installed and running, execute e.g.

docker pull eddelbuettel/docker-debian-r:add-r-devel-san

to retrieve the (A)SAN address sanitizer container. We will use this image below.

Launching Docker

As discussed we use the container provided an (A)SAN enabled build of R-devel: eddelbuettel/docker-debian-r:add-r-devel-san. We will work in an interactive terminal (switches -t -i) and launch the bash shell (final command-line argument /bin/bash). For the Debian-based (A)SAN container, this amounts to starting

docker run -t -i eddelbuettel/docker-debian-r:add-r-devel-san /bin/bash

which will drop us in a prompt similar to this:

root@731dbe48ae89:/#

where the random hostname component of the pronpt will likely be different.

Installing sanitizers

As the sanitizers package is now on CRAN, we can simply use my preferred wrapper install.r (from the littler to install it with a single convenient command:

root@731dbe48ae89:/# install.r sanitizers
trying URL 'http://cran.us.r-project.org/src/contrib/sanitizers_0.1.0.tar.gz'
Content type 'application/x-gzip' length 3963 bytes
opened URL
==================================================
downloaded 3963 bytes

* installing *source* package 'sanitizers' ...
** package 'sanitizers' successfully unpacked and MD5 sums checked
** libs
g++ -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -c heap_address.cpp -o heap_address.o
g++ -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -c stack_address.cpp -o stack_address.o
g++ -shared -Wl,-z,relro -o sanitizers.so heap_address.o stack_address.o -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/sanitizers/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (sanitizers)

The downloaded source packages are in
        '/tmp/downloaded_packages'
root@731dbe48ae89:/#

Note that this uses the R-release version in the container. So if we now run code from the package, it will not fail:

root@731dbe48ae89:/# r -lsanitizers -e'print(stackAddressSanitize(42))'
[1] 10
root@731dbe48ae89:/# Rscript -e 'library(sanitizers); print(stackAddressSanitize(42))'
[1] 10
root@731dbe48ae89:/#

Here we used both r from the littler package and Rscript.

Re-Installing sanitizers under ASAN-enabled R

The key to use of the ASAN / UBSAN tests is to rebuild the corresponding binaries with the proper -fsanitize=address or -fsanitize=undefined switches. See the Writing R Extension manual on Checking Memory Access. for full details.

Here we simply re-use the already-downloaded tarball and the Rdevel command from the corresponding R-devel installation which has been properly configured in this container:

root@731dbe48ae89:/# cd /tmp/downloaded_packages/
root@731dbe48ae89:/tmp/downloaded_packages# Rdevel CMD INSTALL sanitizers_0.1.0.tar.gz
* installing to library '/usr/local/lib/R/site-library'
* installing *source* package 'sanitizers' ...
** package 'sanitizers' successfully unpacked and MD5 sums checked
** libs
g++-4.9 -fsanitize=address -I/usr/local/lib/R/include -DNDEBUG  -I/usr/local/include    -fpic  -pipe -Wall -pedantic -O3  -c heap_address.cpp -o heap_address.o
g++-4.9 -fsanitize=address -I/usr/local/lib/R/include -DNDEBUG  -I/usr/local/include    -fpic  -pipe -Wall -pedantic -O3  -c stack_address.cpp -o stack_address.o
g++-4.9 -fsanitize=address -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o sanitizers.so heap_address.o stack_address.o -L/usr/local/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/sanitizers/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (sanitizers)
root@731dbe48ae89:/tmp/downloaded_packages#

This uses g++-4.9 which is already the default in Debian testing, and the proper -fsanitize=address option.

Triggering ASAN error

With the rebuilt package, and the Rscriptdevel front-end corresponding to R-devel (which is instrumented just like the R-devel binary used in the previous section to use the sanitize facility), we now get the expected error:

root@731dbe48ae89:/tmp/downloaded_packages# Rscriptdevel -e 'library(sanitizers); print(stackAddressSanitize(42))'
=================================================================
==75==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff84b24058 at pc 0x7f14d9168d0f bp 0x7fff84b23df0 sp 0x7fff84b23de8
READ of size 4 at 0x7fff84b24058 thread T0

Address 0x7fff84b24058 is located in stack of thread T0 at offset 40 in frame

  This frame has 4 object(s):
    [32, 40) 'ofun' <== Memory access at offset 40 overflows this variable
    [96, 120) 'symbol'
    [160, 680) 'cargs'
    [736, 1760) 'buf'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow ??:0 stackAddressSanitize
Shadow bytes around the buggy address:
  0x10007095c7b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007095c7c0: f1 f1 f1 f1 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007095c7d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007095c7e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007095c7f0: 00 00 00 00 00 00 f4 f4 f3 f3 f3 f3 00 00 00 00
=>0x10007095c800: 00 00 00 00 00 00 f1 f1 f1 f1 00[f4]f4 f4 f2 f2
  0x10007095c810: f2 f2 00 00 00 f4 f2 f2 f2 f2 00 00 00 00 00 00
  0x10007095c820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007095c830: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007095c840: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007095c850: 00 00 00 00 00 00 00 00 00 00 00 f4 f4 f4 f2 f2
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Contiguous container OOB:fc
  ASan internal:           fe
==75==ABORTING
root@731dbe48ae89:/tmp/downloaded_packages#

This looks actually more impressive on a proper console due to clever use of colour highligting which got lost by the copy and paste I did here.

Installing and testing sanitizers using the Ubuntu container

We can do the same for the Ubuntu-based container test:

docker run -t -i eddelbuettel/docker-debian-r:add-r-devel-ubsan /bin/bash

with a similar set of commands:

edd@max:~/git$ docker run -t -i eddelbuettel/docker-ubuntu-r:add-r-devel-san /bin/bash
root@e3ce377ff5da:/# install.r sanitizers
trying URL 'http://cran.us.r-project.org/src/contrib/sanitizers_0.1.0.tar.gz'
Content type 'application/x-gzip' length 3963 bytes
opened URL
==================================================
downloaded 3963 bytes

* installing *source* package 'sanitizers' ...
** package 'sanitizers' successfully unpacked and MD5 sums checked
** libs
g++ -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -c heap_address.cpp -o heap_address.o
g++ -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -c stack_address.cpp -o stack_address.o
g++ -shared -Wl,-Bsymbolic-functions -Wl,-z,relro -o sanitizers.so heap_address.o stack_address.o -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/sanitizers/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (sanitizers)

The downloaded source packages are in
        '/tmp/downloaded_packages'
root@e3ce377ff5da:/# Rscript -e 'library(sanitizers); print(heapAddressSanitize(10))'
[1] 0
root@e3ce377ff5da:/#

As expected, no error under R-release. So let’s re-install under R-devel:

root@e3ce377ff5da:/# cd /tmp/downloaded_packages/
root@e3ce377ff5da:/tmp/downloaded_packages# Rdevel CMD INSTALL sanitizers_0.1.0.tar.gz
* installing to library '/usr/local/lib/R/site-library'
* installing *source* package 'sanitizers' ...
** package 'sanitizers' successfully unpacked and MD5 sums checked
** libs
g++-4.8 -fsanitize=address -I/usr/local/lib/R/include -DNDEBUG  -I/usr/local/include    -fpic  -pipe -Wall -pedantic -O3  -c heap_address.cpp -o heap_address.o
g++-4.8 -fsanitize=address -I/usr/local/lib/R/include -DNDEBUG  -I/usr/local/include    -fpic  -pipe -Wall -pedantic -O3  -c stack_address.cpp -o stack_address.o
g++-4.8 -fsanitize=address -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o sanitizers.so heap_address.o stack_address.o -L/usr/local/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/sanitizers/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (sanitizers)
root@e3ce377ff5da:/tmp/downloaded_packages#
root@e3ce377ff5da:/tmp/downloaded_packages# Rscriptdevel -e 'library(sanitizers); print(heapAddressSanitize(10))'
=================================================================
==73== ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602e00016f98 at pc 0x7fe5cd41cb17 bp 0x7fff065ac580 sp 0x7fff065ac578
READ of size 4 at 0x602e00016f98 thread T0
0x602e00016f98 is located 40 bytes to the left of 408-byte region [0x602e00016fc0,0x602e00017158)
allocated by thread T0 here:
Shadow bytes around the buggy address:
  0x0c063fffada0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c063fffadb0: fa fa fa fa fa fa fa fa fa fa fa fa 00 00 00 00
  0x0c063fffadc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c063fffadd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c063fffade0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa
=>0x0c063fffadf0: fa fa fa[fa]fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c063fffae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c063fffae10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c063fffae20: 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa
  0x0c063fffae30: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c063fffae40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
==73== ABORTING
root@e3ce377ff5da:/tmp/downloaded_packages#

As expected, this triggered the error, using the g++-4.8-based Ubuntu container.

Command-line use

Instead of launching an interactive container, tests can also be executed on the host system via a proper command-line invocation. And R CMD check works just fine as the sanitizers package contains a tests/ directory — so the error will be flagged.

Here we use the very convenient mount option via -v dirOnHost:dirInContainer to make our current directory (containing a tarball to be tested) available inside the container:

edd@max:~/git$ docker run -t -v :/host eddelbuettel/docker-debian-r:add-r-devel-san  Rdevel CMD check /host/sanitizers_0.1.0.tar.gz
* using log directory '//sanitizers.Rcheck'
* using R Under development (unstable) (2014-07-25 r66251)
* using platform: x86_64-unknown-linux-gnu (64-bit)
* using session charset: ASCII
* checking for file 'sanitizers/DESCRIPTION' ... OK
* checking extension type ... Package
* this is package 'sanitizers' version '0.1.0'
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package 'sanitizers' can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking line endings in Makefiles ... OK
* checking for portable use of  and  ... OK
* checking compiled code ... OK
* checking examples ... NONE
* checking for unstated dependencies in tests ... OK
* checking tests ...
  Running 'testHeapAddressSanitize.R'
  Running 'testStackAddressSanitize.R'
 ERROR
Running the tests in 'tests/testStackAddressSanitize.R' failed.
Last 13 lines of output:
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack partial redzone:   f4
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Contiguous container OOB:fc
    ASan internal:           fe
  ==275==ABORTING
edd@max:~/git$

This demonstrated very convenient batch test mode suitable for large-scale automated tests.

(The testHeapAddressSanitize.R also fails provided a lower integer value is passed. I will correct this in the next upload).

Summary

Docker provides a very useful and lightweight operating-system abstraction. Among other things, this permits the deployment of different application configurations which is very useful for testing debugging.

Here we demonstrated how the sanitizers package for R can be used to validate ASAN and UBSAN testing setup offered by recent g++ and clang versions. Using Docker, and the prebuild Docker containers available at my Docker Hub repository, deployment of such tests becomes as easy as invocation of single shell command.

sanitizers is now on the CRAN mirror network, and can be installed like any other R package. As shown here, it is most useful in asserting that a ASAN / UBSAN testing installation of R is working correctly by detecting the (known) errors in the sanitizers package.

Author

Dirk Eddelbuettel

License

GPL (>= 2)

GitHub Build Status

Build Status

Initially created: Sun Aug 3 15:12:47 CDT 2014
Last modified: Wed Aug 6 23:00:26 CDT 2014