In this context, I have a nice new example. And for once, it is work-related. I generally cannot share too much of what we do there as this is, well, proprietary, but I have this nice new example. The other day, I was constructing (large) time series of implied volatilities. Implied volatilities can be thought of as the complement to an option's price: given a price (and all other observables which can be thought of as fixed), we compute an implied volatility price (typically via the standard Black-Scholes model). Given a changed implied volatility, we infer a new price -- see this Wikipedia page for more details. In essence, it opens the door to all sorts of arbitrage and relative value pricing adventures.
Now, we observe prices fairly frequently to create somewhat sizeable time series of option prices. And each price corresponds to one matching implied volatility, and for each such price we have to solve a small and straightforward optimization problem: to compute the implied volatility given the price. This is usually done with an iterative root finder.
The problem comes from the fact that we have to do this (i) over and over and over for large data sets, and (ii) that there are a number of callbacks from the (generic) solver to the (standard) option pricer.
So our first approach was to just call the corresponding function GBSVolatility
from the
fOption package from the trusted
Rmetrics project by Diethelm Wuertz et al. This worked fine, but even with the usual tricks of splitting over
multiple cores/machines, it simply took too long for the resolution and data amount we desired. One of the problems is that this function (which
uses the proper uniroot
optimizer in R) is not inefficient per se, but simply makes to many function call back to the option pricer as
can be seen from a quick glance at the code. The helper function .fGBSVolatility
gets called time and time again:
R> GBSVolatility function (price, TypeFlag = c("c", "p"), S, X, Time, r, b, tol = .Machine$double.eps, maxiter = 10000) { TypeFlag = TypeFlag[1] volatility = uniroot(.fGBSVolatility, interval = c(-10, 10), price = price, TypeFlag = TypeFlag, S = S, X = X, Time = Time, r = r, b = b, tol = tol, maxiter = maxiter)$root volatility } <environment: namespace:fOptions> R> R> .fGBSVolatility function (x, price, TypeFlag, S, X, Time, r, b, ...) { GBS = GBSOption(TypeFlag = TypeFlag, S = S, X = X, Time = Time, r = r, b = b, sigma = x)@price price - GBS } <environment: namespace:fOptions>
So the next idea was to try the corresponding function from my RQuantLib package which brings (parts of) QuantLib to R. That was seen as been lots faster already. Now, QuantLib is pretty big and so is RQuantLib, and we felt it may not make sense to install it on a number of machines just for this simple problem. So one evening this week I noodled around for an hour or two and combined (i) a basic Black/Scholes calculation and (ii) a standard univariate zero finder (both of which can be found or described in numerous places) to minimize the difference between the observed price and the price given an implied volatility. With about one hundred lines in C++, I had something which felt fast enough. So today I hooked this into R via a two-line wrapper in quickly-created package using Rcpp.
I had one more advantage here. For our time series problem, the majority of the parameters (strike, time to maturity, rate, ...) are fixed, so we can
structure the problem to be vectorised right from the start. I cannot share the code or more the details of my new implementation. However, both
GBSVolatility
and EuropeanOprionImpliedVolatility
are on CRAN (and as I happen to maintain these for
Debian, also just one sudo apt-get install r-cran-foptions r-cran-rquantlib
away if you're
on Debian or Ubuntu). And writing the other solver is really not that
involved.
Anyway, here is the result, courtesy of a quick run via the rbenchmark package. We create a vector of length 500; the implied volatility computation will be performed at each point (and yes, our time series are much longer indeed). This is replicated 100 times (as is the default for rbenchmark) for each of the three approaches:
The new local solution is denoted byxyz@xxxxxxxx:~$ r xxxxR/packages/xxxxOptions/demo/timing.R test replications elapsed relative user.self sys.self user.child sys.child 3 zzz(X) 100 0.038 1.000 0.040 0.000 0 0 2 RQL(X) 100 3.657 96.237 3.596 0.060 0 0 1 fOp(X) 100 448.060 11791.053 446.644 1.436 0 0 xyz@xxxxxxxx:~$
zzz(X)
. It is already orders of magnitude faster than the RQL(x)
function using
RQuantLib (which is, I presume, due to my custom solution internalising the loop). And
the new approach is a laughable amount faster than the basic approach (shown as fOp
) via
fOptions. For one hundred replications of solving implied volatilities for
all elements of a vector of size 500, the slow solution takes about 7.5 minutes --- while the fast solution takes 38 milliseconds. Which comes to a
relative gain of over 11,000.
So sitting down with your C++ compiler to craft a quick one-hundred lines, combining two well-known and tested methods, can reap sizeable benefits. And Rcpp makes it trivial to call this from R.