Vectorized R vs Rcpp

In my previous post, I tried to show, that Rcpp is 1000 faster than pure R and that generated the fuss in the comments. Being lazy, I didn’t vectorize R code and at the end I was comparing apples vs oranges.

To fix that problem, I built a new script, where I’m trying to compare apples against apples. First piece of code named “ifelse R” uses R “ifelse” function to vectorize code. Second piece of code is fully vectorized code written in R, third – pure C++ code and the last one is C++, where  Rcpp ”ifelse” function is used.

Photobucket

 

name seconds
ifelse R 27.50
vectorized R 10.40
pure C++ 0.44
vectorized C++ 2.24

Here we go – vectorization truly helps, but pure C++ code still 23 times faster. Of course you pay the price when writing it in C++.
I found a bit strange, that vectorized C++ code doesn’t perform that well…

You can get the code from github or review it below:

?View Code RSPLUS
#Author Dzidorius Martinaitis
#Date 2012-02-01
#Description http://www.investuotojas.eu/2012/02/01/vectorized-r-vs-rcpp
 
bid = runif(50000000,5,9)
ask = runif(50000000,5,9)
close = runif(50000000,5,9)
 
x=data.frame(bid=bid,ask=ask,last_price=close)
rez=0
 
###########    ifelse R  #################
answ=as.vector(system.time(
{
rez = ifelse(x$last_price>0,ifelse(x[, "bid"] > x[, "last_price"], x[, "bid"], ifelse((x[, "ask"] > 0) & (x[, "ask"] < x[, "last_price"]), x[, "ask"], x[, "last_price"])), 0.5*(x[, "ask"] + x[,"bid"]))
})[1])
###########   end ifelse R  #################
 
###########    vectorized R  #################
 
answ=append(answ,system.time(
{
lgt0 = x$last_price > 0
bgtl = x$bid > x$last_price
agt0 = x$ask > 0
altl = x$ask > x$last_price
rez = x$last_price
rez[lgt0 & agt0 & altl] = x$ask[lgt0 & agt0 & altl]
rez[lgt0 & bgtl] = x$bid[lgt0 & bgtl]
rez[!lgt0] = (x$ask[!lgt0]+x$bid[!lgt0])/2
}
)[1])
###########   end vectorized R  #################
 
#C++ code starts here
 
library(inline)
library(Rcpp)
 
###########    pure C++  #################
 
code='
NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_);
int bid_size = bid.size();
NumericVector ret(bid_size);
for(int i =0;i<bid_size;i++)
{
  if(close[i]>0)
  {
    if(bid[i]>close[i])
    {
      ret[i] = bid[i]; 
    }
    else if(ask[i]>0 && ask[i]<close[i])
    {
      ret[i] = ask[i];//
    }
    else
    {
      ret[i] = close[i];//
    }
  }
  else
  {
    ret[i]=(bid[i]+ask[i])/2;
  }
 
}
return ret;
'
getLastPrice <- cxxfunction(signature( bid_ = "numeric",ask_ = "numeric",close_="numeric"),body=code,plugin="Rcpp")
rez=0
answ=append(answ,system.time(
  {
    rez=getLastPrice(as.numeric(x$bid),as.numeric(x$ask),as.numeric(x$last_price))
  })[1])
 
###########   end pure C++  #################
 
#summary(rez)
 
 
###########    vectorized C++  #################
code='
NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_);
int bid_size = bid.size();
NumericVector ret=ifelse(close>0,ifelse(bid >close, bid, ifelse(ask > 0,ifelse(ask < close,ask, close),close)), 0.5*(ask + bid));
return ret;
'
getLastPrice <- cxxfunction(signature( bid_ = "numeric",ask_ = "numeric",close_="numeric"),body=code,plugin="Rcpp")
rez=0
answ=append(answ,system.time(
{
  rez=getLastPrice(as.numeric(x$bid),as.numeric(x$ask),as.numeric(x$last_price))
}
)[1])
 
###########   end vectorized C++  #################
 
#summary(rez)
names(answ)=c('ifelse R','vectorized R','pure C++','vectorized C++')
 
library(ggplot2)
a=data.frame(ind=1:4,val=answ)
ggplot(a,aes(ind,val))+geom_point(legend=F)+geom_text(aes(label=names(answ),hjust=c(-0.2,-0.2,-0.2,0.8),vjust=c(0,0,0,-1)),size=4)

Comments (9)

The power of Rcpp

While ago I built two R scripts to track OMX Baltic Benchmark Fund against the index. One script returns the deviation of  fund from the index and it works fast enough. The second calculates the value of the fund every minute and it used to take for while. For example, it spent 2 minutes or more to get the values for one day. Here is an example of the result:

Photobucket

Following piece of code was in question:

?View Code RSPLUS
for(y in 1:NROW(x))
 {
    z=x[y,]
    if(as.numeric(z$last_price>0))
    {
      if(as.numeric(z$bid>z$last_price))rez[y]=z$bid
      else if(as.numeric(z$ask)>0 &amp; as.numeric(z$ask)<z$last_price)rez[y]=z$ask
      else rez[y]=z$last_price
    }
    else
    {
      rez[y]=(z$ask+z$bid)/2
    }
 }

The code above loops over time series and based on set of rules tries to decide which price (bid, ask or previous one) to use for calculations. Pure R script used to take 100 seconds to derive the price.

During the weekend I found time to watch very interesting Rcpp presentation. To my surprise, there are numerous ways to seamlessly integrate C++ into R code. So, I decided to rewrite the code above in C++ (Rcpp and inline packages were used).

?View Code RSPLUS
#c++ code embed in code value
code='
NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_);NumericVector ret(ask_);
int bid_size = bid.size();
for(int i =0;i<bid_size;i++)
{
  if(close[i]>0)
  {
    if(bid[i]>close[i])
    {
      ret[i] = bid[i];
    }
    else if(ask[i]>0 &amp;&amp; ask[i]<close[i])
    {
      ret[i] = ask[i];//
    }
    else
    {
      ret[i] = close[i];//
    }
  }
  else
  {
    ret[i]=(bid[i]+ask[i])/2;
  }
 
}
return ret;
'
#a glue function between C++ and R
getLastPrice = cxxfunction(signature( bid_ = "numeric",ask_ = "numeric",close_="numeric"),body=code,plugin="Rcpp")
 
#and the call of the function
getLastPrice(as.numeric(x$bid),as.numeric(x$ask),as.numeric(x$last_price))

What did I get in return? Well, 0.1 of a second instead of 100 seconds!

Comments (10)