The power of Rcpp

While ago I built two R scripts to track OMX Baltic Benchmark Fund against the index. One script returns the deviation of  fund from the index and it works fast enough. The second calculates the value of the fund every minute and it used to take for while. For example, it spent 2 minutes or more to get the values for one day. Here is an example of the result:

Photobucket

Following piece of code was in question:

?View Code RSPLUS
for(y in 1:NROW(x))
 {
    z=x[y,]
    if(as.numeric(z$last_price>0))
    {
      if(as.numeric(z$bid>z$last_price))rez[y]=z$bid
      else if(as.numeric(z$ask)>0 &amp; as.numeric(z$ask)<z$last_price)rez[y]=z$ask
      else rez[y]=z$last_price
    }
    else
    {
      rez[y]=(z$ask+z$bid)/2
    }
 }

The code above loops over time series and based on set of rules tries to decide which price (bid, ask or previous one) to use for calculations. Pure R script used to take 100 seconds to derive the price.

During the weekend I found time to watch very interesting Rcpp presentation. To my surprise, there are numerous ways to seamlessly integrate C++ into R code. So, I decided to rewrite the code above in C++ (Rcpp and inline packages were used).

?View Code RSPLUS
#c++ code embed in code value
code='
NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_);NumericVector ret(ask_);
int bid_size = bid.size();
for(int i =0;i<bid_size;i++)
{
  if(close[i]>0)
  {
    if(bid[i]>close[i])
    {
      ret[i] = bid[i];
    }
    else if(ask[i]>0 &amp;&amp; ask[i]<close[i])
    {
      ret[i] = ask[i];//
    }
    else
    {
      ret[i] = close[i];//
    }
  }
  else
  {
    ret[i]=(bid[i]+ask[i])/2;
  }
 
}
return ret;
'
#a glue function between C++ and R
getLastPrice = cxxfunction(signature( bid_ = "numeric",ask_ = "numeric",close_="numeric"),body=code,plugin="Rcpp")
 
#and the call of the function
getLastPrice(as.numeric(x$bid),as.numeric(x$ask),as.numeric(x$last_price))

What did I get in return? Well, 0.1 of a second instead of 100 seconds!

10 Comments »

  1. Vaidotas Zemlys said,

    January 31, 2012 @ 9:39

    Strictly speaking your R code is only written in R syntax, but is not R :) For such type of code naturally Rcpp is a better solution. I suspect that it is possible to vectorise your code, which would speed up things. Your logical conditions operate on the columns, which means that you can operate with entire column at once. You have the 3 logical conditions which result in assigning some result of the column.

    Denote these conditions by cond1, cond2 and cond3:
    z<-x

    cond1=0
    cond2=z$last_price
    cond3=0 & z$ask<z$last_price

    then

    res <- rep(NA,nrow(z))
    res[!cond1] <- (z$ask[!cond1]+z$bid[!cond1])/2

    cond4 <- cond1 & cond2
    res[cond4] <- z$bid[cond4]

    cond5<- cond1 & !cond2 & cond3
    res[cond5] <- z$ask[cond5]

    cond6 <- cond1 & !cond2 & !cond3
    res[cond6] <- z$last_price[cond6]

    You can check that the conditions do not intersect and exaust all possible cases with the following code:

    table(apply(cbind(!cond1,cond4,cond5,cond6),1,sum) )

    This should produce vector of size 1 with name 1 and value nrow(z).

    This should work much faster than the for loop. By the way why do you use as.numeric to convert boolean?

  2. Dzidorius Martinaitis said,

    January 31, 2012 @ 11:13

    Vaidotas,
    I tried to run your code, but I’m afraid that wordpress made its own corrections:( Anyway, it makes sense that you said and I thought about that, when I started to build the script. But at that point vectorization looked to complicated, so I skip it and later on came up with Rcpp solution.
    Talking about conversion – good point, my mistype. In most of cases I want to be sure, that I’m dealing with correct type, so I do as.numeric(x$bid).

  3. Grizzly said,

    January 31, 2012 @ 12:03

    As Vaidotas says, your loop is inefficient. The ‘ifelse’ construct works very fast :

    rez = ifelse(x[, "bid"] > x[, "last_price"], x[, "bid"], ifelse((x[, "ask"] > 0) & (x[, "ask"] < x[, "last_price"]), x[, "ask"], x[, "last_price"])), 0.5*(x[, "ask"] + x[,"bid"]))

  4. Dzidorius Martinaitis said,

    January 31, 2012 @ 12:26

    Grizzly,

    here is updated version of your code, because one of ifelse was missing. It is true, now it takes 0.8 instead of 100. But Rcpp is still faster ;)


    rez = ifelse(x$last_price>0,ifelse(x[, "bid"] > x[, "last_price"], x[, "bid"], ifelse((x[, "ask"] > 0) & (x[, "ask"] < x[, "last_price"]), x[, "ask"], x[, "last_price"])), 0.5*(x[, "ask"] + x[,"bid"]))

  5. Zachary Jones said,

    January 31, 2012 @ 15:32

    The vectorization looks complicated sure, but that is really one of the main advantages of using R. Benchmark this with vectorized conditionals and a vectorized loop and I am sure you will be much happier. Those vectorized functions are likely better optimized than your C++ extension. Also definitely check out the Foreach package by revolution. It will allow you to keep similar syntax.

  6. Dzidorius Martinaitis said,

    January 31, 2012 @ 15:36

    Zachary Jones,
    if you accept, that ifelse code provides vectorization, then it is not faster than my C++ extension – 0.1 for C++ and 0.8 for ifelse.

  7. Joshua Ulrich said,

    January 31, 2012 @ 15:59

    I agree with Vaidotas that vectorization is better in this case. You also could have put your loop into a function and byte-compiled it for a 2-3x speedup. Rcpp is great when looping can’t be avoided (e.g. when this period’s result depends on last period’s result), but all compiled code has potential costs (harder to debug, share, etc.).

    Here’s how I would approach it. The code is untested, so there may be a bug or two, but I hope the general idea is clear.

    lgt0 = x$last_price > 0
    bgtl = x$bid > x$last_price
    agt0 = x$ask > 0
    altl = x$ask > x$last_price
    rez = x$last_price
    rez[lgt0 & agt0 & altl] = x$ask[lgt0 & agt0 & altl]
    rez[lgt0 & bgtl] = x$bid[lgt0 & bgtl]
    rez[!lgt0] = (x$ask[!lgt0]+x$bid[!lgt0])/2

  8. Zachary Jones said,

    January 31, 2012 @ 16:19

    My point wasn’t just to vectorize the conditionals, also the loop.

  9. Bhoom said,

    February 1, 2012 @ 16:46

    I guess if you are happy to write C++ code. The consensun seems to be that C++ is the best for speed. If you’d rather stick with R, then try to use vectorization, but sometimes it’s unavoidable to use loop. R just makes life quite a lot easier with the existing function and data manipulation, in my opinion.

  10. Quantitative thoughts » Vectorized R vs Rcpp said,

    February 1, 2012 @ 21:03

    [...] In my previous post, I tried to show, that Rcpp is 1000 faster than pure R and that generated the fuss in the comments. Being lazy, I didn’t vectorize R code and at the end I was comparing apples vs oranges. [...]

RSS feed for comments on this post · TrackBack URI

Leave a Comment