Vectorized R vs Rcpp

In my previous post, I tried to show, that Rcpp is 1000 faster than pure R and that generated the fuss in the comments. Being lazy, I didn’t vectorize R code and at the end I was comparing apples vs oranges.

To fix that problem, I built a new script, where I’m trying to compare apples against apples. First piece of code named “ifelse R” uses R “ifelse” function to vectorize code. Second piece of code is fully vectorized code written in R, third – pure C++ code and the last one is C++, where  Rcpp ”ifelse” function is used.

Photobucket

 

name seconds
ifelse R 27.50
vectorized R 10.40
pure C++ 0.44
vectorized C++ 2.24

Here we go – vectorization truly helps, but pure C++ code still 23 times faster. Of course you pay the price when writing it in C++.
I found a bit strange, that vectorized C++ code doesn’t perform that well…

You can get the code from github or review it below:

?View Code RSPLUS
#Author Dzidorius Martinaitis
#Date 2012-02-01
#Description http://www.investuotojas.eu/2012/02/01/vectorized-r-vs-rcpp
 
bid = runif(50000000,5,9)
ask = runif(50000000,5,9)
close = runif(50000000,5,9)
 
x=data.frame(bid=bid,ask=ask,last_price=close)
rez=0
 
###########    ifelse R  #################
answ=as.vector(system.time(
{
rez = ifelse(x$last_price>0,ifelse(x[, "bid"] > x[, "last_price"], x[, "bid"], ifelse((x[, "ask"] > 0) & (x[, "ask"] < x[, "last_price"]), x[, "ask"], x[, "last_price"])), 0.5*(x[, "ask"] + x[,"bid"]))
})[1])
###########   end ifelse R  #################
 
###########    vectorized R  #################
 
answ=append(answ,system.time(
{
lgt0 = x$last_price > 0
bgtl = x$bid > x$last_price
agt0 = x$ask > 0
altl = x$ask > x$last_price
rez = x$last_price
rez[lgt0 & agt0 & altl] = x$ask[lgt0 & agt0 & altl]
rez[lgt0 & bgtl] = x$bid[lgt0 & bgtl]
rez[!lgt0] = (x$ask[!lgt0]+x$bid[!lgt0])/2
}
)[1])
###########   end vectorized R  #################
 
#C++ code starts here
 
library(inline)
library(Rcpp)
 
###########    pure C++  #################
 
code='
NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_);
int bid_size = bid.size();
NumericVector ret(bid_size);
for(int i =0;i<bid_size;i++)
{
  if(close[i]>0)
  {
    if(bid[i]>close[i])
    {
      ret[i] = bid[i]; 
    }
    else if(ask[i]>0 && ask[i]<close[i])
    {
      ret[i] = ask[i];//
    }
    else
    {
      ret[i] = close[i];//
    }
  }
  else
  {
    ret[i]=(bid[i]+ask[i])/2;
  }
 
}
return ret;
'
getLastPrice <- cxxfunction(signature( bid_ = "numeric",ask_ = "numeric",close_="numeric"),body=code,plugin="Rcpp")
rez=0
answ=append(answ,system.time(
  {
    rez=getLastPrice(as.numeric(x$bid),as.numeric(x$ask),as.numeric(x$last_price))
  })[1])
 
###########   end pure C++  #################
 
#summary(rez)
 
 
###########    vectorized C++  #################
code='
NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_);
int bid_size = bid.size();
NumericVector ret=ifelse(close>0,ifelse(bid >close, bid, ifelse(ask > 0,ifelse(ask < close,ask, close),close)), 0.5*(ask + bid));
return ret;
'
getLastPrice <- cxxfunction(signature( bid_ = "numeric",ask_ = "numeric",close_="numeric"),body=code,plugin="Rcpp")
rez=0
answ=append(answ,system.time(
{
  rez=getLastPrice(as.numeric(x$bid),as.numeric(x$ask),as.numeric(x$last_price))
}
)[1])
 
###########   end vectorized C++  #################
 
#summary(rez)
names(answ)=c('ifelse R','vectorized R','pure C++','vectorized C++')
 
library(ggplot2)
a=data.frame(ind=1:4,val=answ)
ggplot(a,aes(ind,val))+geom_point(legend=F)+geom_text(aes(label=names(answ),hjust=c(-0.2,-0.2,-0.2,0.8),vjust=c(0,0,0,-1)),size=4)

9 Comments »

  1. Romain said,

    February 2, 2012 @ 10:43

    Nice. However, in your Rcpp sugar version (the one that uses the ifelse construct), you are overwritting on ask_ because of this line:

    NumericVector ret(ask_);

    I would suggest you do this directly:

    NumericVector ret = ifelse( ….. )

  2. Dzidorius Martinaitis said,

    February 2, 2012 @ 11:20

    Romain,
    thanks for the comment, fixed.

  3. Leonardo Miceli said,

    February 2, 2012 @ 14:11

    I am getting a better performance avoiding the ifelse function. Actually, the pure “if else” statement performs much better.

  4. Louis Schenck said,

    February 3, 2012 @ 4:06

    Making a couple of modifications to the vectorized R can reduce the user time by about 50%:

    answ=append(answ,system.time(
    {
    rez = vector(mode=’numeric’, nrow(x))
    lgt0 = x$last_price > 0
    bgtl = x$bid > x$last_price
    agt0 = x$ask > 0
    altl = x$ask > x$last_price
    rez = x$last_price
    xx <- which(lgt0 & agt0 & altl)
    rez[xx] = x$ask[xx]
    xx <- which(lgt0 & bgtl)
    rez[xx] = x$bid[xx]
    xx <- which(!lgt0)
    rez[xx] = (x$ask[xx]+x$bid[xx])/2
    }
    )[1])

    My timings were:

    ifelse R vectorized R Modified Vect R
    28.78 10.88 5.26

    Great post.

  5. Dzidorius Martinaitis said,

    February 3, 2012 @ 11:16

    Hi Louis,
    I appreciate your input. It is true – it 50 % faster.
    I updated the code on github. New graph can be found here:
    http://i176.photobucket.com/albums/w180/investuotojas/performance-1.png

  6. Romain said,

    February 3, 2012 @ 15:42

    The version that uses (Rcpp’s) ifelse is slower because ifelse needs to deal with missing values, and because the ifelse calls are nested, lots of redundant checking is performed under the hood.

    As a sidenote, I’ve added (in svn) an initial version of a new sugar function: mapply, that would allow this syntax:


    inc < - '
    double fun(double bid, double ask, double close){
    if(close>0)
    {
    if(bid>close)
    {
    return bid ;
    }
    else if(ask>0 && ask {
    return ask ;
    }
    else
    {
    return close ;
    }
    }
    else
    {
    return (bid+ask)/2;
    }
    }
    '

    getLastPrice_mapply <- cxxfunction( signature(bid_ = "numeric", ask_ = "numeric", close_ = "numeric" ), '

    NumericVector bid(bid_), ask(ask_), close(close_) ;
    NumericVector res = mapply( v1, v2, v3, fun ) ;
    return res ;
    ', plugin = "Rcpp", includes = inc )

    Depending on the size, I get better results with this version.

  7. Romain said,

    February 3, 2012 @ 15:43

    The code was not correctly included. I’ve copied it here:
    https://gist.github.com/1730483

  8. Dzidorius Martinaitis said,

    February 3, 2012 @ 17:44

    Hi Romain,
    thanks for the share. However, it is a bit dangerous to try your code – if something is wrong with early version of sugar, it would be too painful to rollback. Once it is stable, then I can include into the test script.

  9. Kristian said,

    February 6, 2012 @ 15:40

    Hi all.

    I may be hung up on details now, but it seems there is still some redundancy in the vectorized R code. When you test lgt0 & agt0 & altl, the agt0 is implied by the combination of lgt0 and altl, so agt0 can be removed. As this is the only placed it is used it does not have to be created at all. The result is one line shorter and one ‘& agt0′ simpler, plus an added speed bonus of around 10%.

    lgt0 = x$last_price > 0
    bgtl = x$bid > x$last_price
    altl = x$ask > x$last_price
    rez = x$last_price
    xx <- which(lgt0 & altl)
    rez[xx] = x$ask[xx]
    xx <- which(lgt0 & bgtl)
    rez[xx] = x$bid[xx]
    xx <- which(!lgt0)
    rez[xx] = (x$ask[xx]+x$bid[xx])/2

RSS feed for comments on this post · TrackBack URI

Leave a Comment