Vectorized R vs Rcpp
In my previous post, I tried to show, that Rcpp is 1000 faster than pure R and that generated the fuss in the comments. Being lazy, I didn’t vectorize R code and at the end I was comparing apples vs oranges.
To fix that problem, I built a new script, where I’m trying to compare apples against apples. First piece of code named “ifelse R” uses R “ifelse” function to vectorize code. Second piece of code is fully vectorized code written in R, third – pure C++ code and the last one is C++, where Rcpp ”ifelse” function is used.
| name | seconds |
|---|---|
| ifelse R | 27.50 |
| vectorized R | 10.40 |
| pure C++ | 0.44 |
| vectorized C++ | 2.24 |
Here we go – vectorization truly helps, but pure C++ code still 23 times faster. Of course you pay the price when writing it in C++.
I found a bit strange, that vectorized C++ code doesn’t perform that well…
You can get the code from github or review it below:
#Author Dzidorius Martinaitis #Date 2012-02-01 #Description http://www.investuotojas.eu/2012/02/01/vectorized-r-vs-rcpp bid = runif(50000000,5,9) ask = runif(50000000,5,9) close = runif(50000000,5,9) x=data.frame(bid=bid,ask=ask,last_price=close) rez=0 ########### ifelse R ################# answ=as.vector(system.time( { rez = ifelse(x$last_price>0,ifelse(x[, "bid"] > x[, "last_price"], x[, "bid"], ifelse((x[, "ask"] > 0) & (x[, "ask"] < x[, "last_price"]), x[, "ask"], x[, "last_price"])), 0.5*(x[, "ask"] + x[,"bid"])) })[1]) ########### end ifelse R ################# ########### vectorized R ################# answ=append(answ,system.time( { lgt0 = x$last_price > 0 bgtl = x$bid > x$last_price agt0 = x$ask > 0 altl = x$ask > x$last_price rez = x$last_price rez[lgt0 & agt0 & altl] = x$ask[lgt0 & agt0 & altl] rez[lgt0 & bgtl] = x$bid[lgt0 & bgtl] rez[!lgt0] = (x$ask[!lgt0]+x$bid[!lgt0])/2 } )[1]) ########### end vectorized R ################# #C++ code starts here library(inline) library(Rcpp) ########### pure C++ ################# code=' NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_); int bid_size = bid.size(); NumericVector ret(bid_size); for(int i =0;i<bid_size;i++) { if(close[i]>0) { if(bid[i]>close[i]) { ret[i] = bid[i]; } else if(ask[i]>0 && ask[i]<close[i]) { ret[i] = ask[i];// } else { ret[i] = close[i];// } } else { ret[i]=(bid[i]+ask[i])/2; } } return ret; ' getLastPrice <- cxxfunction(signature( bid_ = "numeric",ask_ = "numeric",close_="numeric"),body=code,plugin="Rcpp") rez=0 answ=append(answ,system.time( { rez=getLastPrice(as.numeric(x$bid),as.numeric(x$ask),as.numeric(x$last_price)) })[1]) ########### end pure C++ ################# #summary(rez) ########### vectorized C++ ################# code=' NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_); int bid_size = bid.size(); NumericVector ret=ifelse(close>0,ifelse(bid >close, bid, ifelse(ask > 0,ifelse(ask < close,ask, close),close)), 0.5*(ask + bid)); return ret; ' getLastPrice <- cxxfunction(signature( bid_ = "numeric",ask_ = "numeric",close_="numeric"),body=code,plugin="Rcpp") rez=0 answ=append(answ,system.time( { rez=getLastPrice(as.numeric(x$bid),as.numeric(x$ask),as.numeric(x$last_price)) } )[1]) ########### end vectorized C++ ################# #summary(rez) names(answ)=c('ifelse R','vectorized R','pure C++','vectorized C++') library(ggplot2) a=data.frame(ind=1:4,val=answ) ggplot(a,aes(ind,val))+geom_point(legend=F)+geom_text(aes(label=names(answ),hjust=c(-0.2,-0.2,-0.2,0.8),vjust=c(0,0,0,-1)),size=4) |











The chart above clearly indicates, that the traded volume on Monday is below (~5%) the day average. There is increase in the volume on Friday, but the significance is under question. Let’s check density diagram to get rid of any doubt about the volatility on Monday and Friday.
The graph above shows hourly volume pattern, where the volume is grouped by hour. The black dots indicate the median of the hour. Please keep in mind, that the trading starts at 9:30 and the first trading hour has only 30 minutes (if you want align the first hour to the others, then you need to multiply the volume of the first hour by two). As we can see, the first and the last are most traded and the volume drops in the middle of the day.
The chart above shows all trades grouped by hour – the black dots indicate median of the trades. The following table is supplementary to the chart – here you find median of the hour.
The final chart shows the volatility grouped by hour. There is a lot jittery, then the market opens, it becomes calmer during the lunch time and slightly increases then the market closes. These are the numbers for each hour:




