Vectorized R vs Rcpp
In my previous post, I tried to show, that Rcpp is 1000 faster than pure R and that generated the fuss in the comments. Being lazy, I didn’t vectorize R code and at the end I was comparing apples vs oranges.
To fix that problem, I built a new script, where I’m trying to compare apples against apples. First piece of code named “ifelse R” uses R “ifelse” function to vectorize code. Second piece of code is fully vectorized code written in R, third – pure C++ code and the last one is C++, where Rcpp ”ifelse” function is used.
| name | seconds |
|---|---|
| ifelse R | 27.50 |
| vectorized R | 10.40 |
| pure C++ | 0.44 |
| vectorized C++ | 2.24 |
Here we go – vectorization truly helps, but pure C++ code still 23 times faster. Of course you pay the price when writing it in C++.
I found a bit strange, that vectorized C++ code doesn’t perform that well…
You can get the code from github or review it below:
#Author Dzidorius Martinaitis #Date 2012-02-01 #Description http://www.investuotojas.eu/2012/02/01/vectorized-r-vs-rcpp bid = runif(50000000,5,9) ask = runif(50000000,5,9) close = runif(50000000,5,9) x=data.frame(bid=bid,ask=ask,last_price=close) rez=0 ########### ifelse R ################# answ=as.vector(system.time( { rez = ifelse(x$last_price>0,ifelse(x[, "bid"] > x[, "last_price"], x[, "bid"], ifelse((x[, "ask"] > 0) & (x[, "ask"] < x[, "last_price"]), x[, "ask"], x[, "last_price"])), 0.5*(x[, "ask"] + x[,"bid"])) })[1]) ########### end ifelse R ################# ########### vectorized R ################# answ=append(answ,system.time( { lgt0 = x$last_price > 0 bgtl = x$bid > x$last_price agt0 = x$ask > 0 altl = x$ask > x$last_price rez = x$last_price rez[lgt0 & agt0 & altl] = x$ask[lgt0 & agt0 & altl] rez[lgt0 & bgtl] = x$bid[lgt0 & bgtl] rez[!lgt0] = (x$ask[!lgt0]+x$bid[!lgt0])/2 } )[1]) ########### end vectorized R ################# #C++ code starts here library(inline) library(Rcpp) ########### pure C++ ################# code=' NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_); int bid_size = bid.size(); NumericVector ret(bid_size); for(int i =0;i<bid_size;i++) { if(close[i]>0) { if(bid[i]>close[i]) { ret[i] = bid[i]; } else if(ask[i]>0 && ask[i]<close[i]) { ret[i] = ask[i];// } else { ret[i] = close[i];// } } else { ret[i]=(bid[i]+ask[i])/2; } } return ret; ' getLastPrice <- cxxfunction(signature( bid_ = "numeric",ask_ = "numeric",close_="numeric"),body=code,plugin="Rcpp") rez=0 answ=append(answ,system.time( { rez=getLastPrice(as.numeric(x$bid),as.numeric(x$ask),as.numeric(x$last_price)) })[1]) ########### end pure C++ ################# #summary(rez) ########### vectorized C++ ################# code=' NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_); int bid_size = bid.size(); NumericVector ret=ifelse(close>0,ifelse(bid >close, bid, ifelse(ask > 0,ifelse(ask < close,ask, close),close)), 0.5*(ask + bid)); return ret; ' getLastPrice <- cxxfunction(signature( bid_ = "numeric",ask_ = "numeric",close_="numeric"),body=code,plugin="Rcpp") rez=0 answ=append(answ,system.time( { rez=getLastPrice(as.numeric(x$bid),as.numeric(x$ask),as.numeric(x$last_price)) } )[1]) ########### end vectorized C++ ################# #summary(rez) names(answ)=c('ifelse R','vectorized R','pure C++','vectorized C++') library(ggplot2) a=data.frame(ind=1:4,val=answ) ggplot(a,aes(ind,val))+geom_point(legend=F)+geom_text(aes(label=names(answ),hjust=c(-0.2,-0.2,-0.2,0.8),vjust=c(0,0,0,-1)),size=4) |

Romain said,
February 2, 2012 @ 10:43
Nice. However, in your Rcpp sugar version (the one that uses the ifelse construct), you are overwritting on ask_ because of this line:
NumericVector ret(ask_);
I would suggest you do this directly:
NumericVector ret = ifelse( ….. )
Dzidorius Martinaitis said,
February 2, 2012 @ 11:20
Romain,
thanks for the comment, fixed.
Leonardo Miceli said,
February 2, 2012 @ 14:11
I am getting a better performance avoiding the ifelse function. Actually, the pure “if else” statement performs much better.
Louis Schenck said,
February 3, 2012 @ 4:06
Making a couple of modifications to the vectorized R can reduce the user time by about 50%:
answ=append(answ,system.time(
{
rez = vector(mode=’numeric’, nrow(x))
lgt0 = x$last_price > 0
bgtl = x$bid > x$last_price
agt0 = x$ask > 0
altl = x$ask > x$last_price
rez = x$last_price
xx <- which(lgt0 & agt0 & altl)
rez[xx] = x$ask[xx]
xx <- which(lgt0 & bgtl)
rez[xx] = x$bid[xx]
xx <- which(!lgt0)
rez[xx] = (x$ask[xx]+x$bid[xx])/2
}
)[1])
My timings were:
ifelse R vectorized R Modified Vect R
28.78 10.88 5.26
Great post.
Dzidorius Martinaitis said,
February 3, 2012 @ 11:16
Hi Louis,
I appreciate your input. It is true – it 50 % faster.
I updated the code on github. New graph can be found here:
http://i176.photobucket.com/albums/w180/investuotojas/performance-1.png
Romain said,
February 3, 2012 @ 15:42
The version that uses (Rcpp’s) ifelse is slower because ifelse needs to deal with missing values, and because the ifelse calls are nested, lots of redundant checking is performed under the hood.
As a sidenote, I’ve added (in svn) an initial version of a new sugar function: mapply, that would allow this syntax:
inc < - '
double fun(double bid, double ask, double close){
if(close>0)
{
if(bid>close)
{
return bid ;
}
else if(ask>0 && ask
return ask ;
}
else
{
return close ;
}
}
else
{
return (bid+ask)/2;
}
}
'
getLastPrice_mapply <- cxxfunction( signature(bid_ = "numeric", ask_ = "numeric", close_ = "numeric" ), '
NumericVector bid(bid_), ask(ask_), close(close_) ;
NumericVector res = mapply( v1, v2, v3, fun ) ;
return res ;
', plugin = "Rcpp", includes = inc )
Depending on the size, I get better results with this version.
Romain said,
February 3, 2012 @ 15:43
The code was not correctly included. I’ve copied it here:
https://gist.github.com/1730483
Dzidorius Martinaitis said,
February 3, 2012 @ 17:44
Hi Romain,
thanks for the share. However, it is a bit dangerous to try your code – if something is wrong with early version of sugar, it would be too painful to rollback. Once it is stable, then I can include into the test script.
Kristian said,
February 6, 2012 @ 15:40
Hi all.
I may be hung up on details now, but it seems there is still some redundancy in the vectorized R code. When you test lgt0 & agt0 & altl, the agt0 is implied by the combination of lgt0 and altl, so agt0 can be removed. As this is the only placed it is used it does not have to be created at all. The result is one line shorter and one ‘& agt0′ simpler, plus an added speed bonus of around 10%.
lgt0 = x$last_price > 0
bgtl = x$bid > x$last_price
altl = x$ask > x$last_price
rez = x$last_price
xx <- which(lgt0 & altl)
rez[xx] = x$ask[xx]
xx <- which(lgt0 & bgtl)
rez[xx] = x$bid[xx]
xx <- which(!lgt0)
rez[xx] = (x$ask[xx]+x$bid[xx])/2