RcppAnnoy: Rcpp bindings for Annoy

Build Status License CRAN Dependencies Downloads Last Commit

What is Annoy?

Annoy is a small, fast and lightweight library for Approximate Nearest Neighbours with a particular focus on efficient memory use and the ability to load a pre-saved index.

Annoy is written by Erik Bernhardsson for use at Spotify, and implemented in about 500 lines of a single C++ template header file — which is wrapped by Erik into a loadable Python module.

Why this package?

It provides a nice example for Rcpp Modules and use of templates: Annoy uses two template data types (generally float and int32_t for efficiency) and one of two distance measures. This package shows that it is easy to wrap both.

It also shows how easy it is to have Python and R shared the exact same functionality by virtue of modules binding on the Python modules and R side (where Rcpp helps).

Source code resides in the RcppAnnoy GitHub repo.

Example

This is implemented as demo/simpleExample.R and mirrors the Python example on the Annoy repo page.

library(RcppAnnoy)

set.seed(123)                           # be reproducible

f <- 40
a <- new(AnnoyEuclidean, f)
n <- 50                                 # not specified

for (i in seq(n)) {
    v <- rnorm(f)
    a$addItem(i-1, v)
}

a$build(50)                             # 50 trees
a$save("/tmp/test.tree")


b <- new(AnnoyEuclidean, f)             # new object, could be in another process
b$load("/tmp/test.tree")                # super fast, will just mmap the file

print(b$getNNsByItem(0, 40))

Status

The package matches the behaviour of the original Python package in the original Python wrapper for the Annoy library. It also replicates all unit tests written for the Python frontend, including a test for efficiently mmap-ing a binary index file. While setting it up, some small contributions were made back to Annoy as well.

As it uses mmap for fast disk-access to stored index file, a Windows build is possible via MapViewOfFile (see e.g. Jeff Ryan’s mmap CRAN package) but we have not needed that functionality. A clean pull requests to the Annoy or RcppAnnoy repos would be welcome.

Author

Dirk Eddelbuettel

Initially created: Sun Nov 16 07:45:09 CST 2014
Last modified: Sun May 26 10:09:42 CDT 2024