I see high frequency data

In the previous post I shared an example how to get high frequency data from IB broker (well, it is retail version of HFD – it has only best bid/ask and the trades). Now, once you saved some data – what should you do next?

Next logical step would be data sanity check and visualization. For example, while preparing R script for this post, I found, that IB data contains numerous duplicates in the quotes. Every time, when the trade happens, IB trading platform sends the price and the size of the trade bundled together. Additionally, it sends the size of the trade as separate quote as well and this completely mess up the data. So, data sanity check and visualization gave me a hint, that something is wrong with the data.

Today I want to show an example in R, which loads data from mongodb and plots some parts of the data. This should give you better intuition on collected data.

Photobucket

The plot shows bid (light blue) prices , ask(green) prices and the trades (red). The size of the red dot indicates volume of the trade.

The source code is shared on github and below:

?View Code RSPLUS
#Author Dzidorius Martinaitis
#Date 2012-03-01
#Description 
 
require(rmongodb)
require(xts)
require(ggplot2)
mongo=mongo.create()
 
buf = mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "tickerId", 20L)
mongo.bson.buffer.start.object(buf, "size")
mongo.bson.buffer.append(buf, "$exists", "true")
mongo.bson.buffer.finish.object(buf)
 
query = mongo.bson.from.buffer(buf)
 
count = mongo.count(mongo,'quotes.trinti',query)
cursor=mongo.find(mongo,'quotes.trinti',query)
 
#############  very slow code #############
#size=''
#system.time(
#while(mongo.cursor.next(cursor)){
#  temp=(mongo.cursor.value(cursor));
#  if(is.xts(size))
#    size=rbind(size,xts(cbind(mongo.bson.value(temp,"field"),mongo.bson.value(temp,"size")),order.by=as.POSIXct(mongo.bson.value(temp,"tstamp")/1000,origin='1970-01-01',tz='Europa/Paris')))
#  else
#    size=xts(cbind(mongo.bson.value(temp,"field"),mongo.bson.value(temp,"size")),order.by=as.POSIXct(mongo.bson.value(temp,"tstamp")/1000,origin='1970-01-01',tz='Europa/Paris'))
})
#############  end very slow  #############
 
size=matrix(nrow=count,ncol=3)
counter=1
system.time(
  while(mongo.cursor.next(cursor))
  {
    temp=(mongo.cursor.value(cursor));
    size[counter,1]=mongo.bson.value(temp,"field");
    size[counter,2]=mongo.bson.value(temp,"size");
    size[counter,3]=mongo.bson.value(temp,"tstamp");
    counter=counter+1;
    if(counter>count)break;
    })
size=xts(size[,1:2],order.by=as.POSIXct(size[,3]/1000,origin='1970-01-01',tz='Europe/Paris'))
colnames(size)=c('field','size')
 
 
buf = mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "tickerId", 26L)
mongo.bson.buffer.start.object(buf, "price")
mongo.bson.buffer.append(buf, "$exists", "true")
mongo.bson.buffer.finish.object(buf)
 
query = mongo.bson.from.buffer(buf)
count = mongo.count(mongo,'quotes.trinti',query)
 
cursor=mongo.find(mongo,'quotes.trinti',query)
price=matrix(nrow=count,ncol=3)
counter=1
system.time(
  while(mongo.cursor.next(cursor))
  {
    temp=(mongo.cursor.value(cursor));
    price[counter,1]=mongo.bson.value(temp,"field");
    price[counter,2]=mongo.bson.value(temp,"price");
    price[counter,3]=mongo.bson.value(temp,"tstamp");
    counter=counter+1;
    if(counter>count)break;
  })
price=xts(price[,1:2],order.by=as.POSIXct(price[,3]/1000,origin='1970-01-01',tz='Europe/Paris'))
price=(price[which(price[,2]>0)])
 
colnames(price)=c('field','price')
 
quotes=cbind(price[,2][price[,1]==1],
             #cac40.volume[,2][cac40.volume[,1]==0],
             price[,2][price[,1]==2],
             #cac40.volume[,2][cac40.volume[,1]==3],
             price[,2][price[,1]==4]
             ,size[,2][size[,1]==5]
             )
 
quotes[,1]=na.locf(quotes[,1])
quotes[,2]=na.locf(quotes[,2])
quotes[,3]=na.locf(quotes[,3])
quotes[which(is.na(quotes[,4])),3]=NA
 
temp=tail(head(quotes,3000),1000)
temp=data.frame(ind=1:NROW(temp),trd=as.numeric(temp[,3])                
                ,bid=as.numeric(temp[,1]),ask=as.numeric(temp[,2])
                ,size=as.numeric(temp[,4])
                )
temp=melt(temp,id=c('ind'),na.rm=TRUE)
x=temp[which(temp$variable=='trd'),]
 
rez=temp[which(temp$variable!='trd'),]
rez=rez[which(rez$variable!='size'),]
a=temp[which(temp$variable=='size'),][,3]
ggplot(rez,aes(x=ind,y=value,color=variable))+geom_line()+geom_point(data=x,aes(size=a))

 

Comments (4)

Tick data retrieval

I just published Java based code to pull tick data from Interactive Brokers. There are thousands tools to get tick data from IB, but I had one feature in mind.

You can get maximum 50 quotes per second from Interactive Brokers (its IB limitation for TWS API) . Imagine a situation, when there is a delay in swapping incoming information, because I\O process is very slow or a short overload of the system. In such case either some piece of data will be lost or the system will crash. OK, let’s say you have plenty of RAM and speedy hard drive. Does it make sense for real time trading system to write tick data into disk and then pass all information further? Can it be done asynchronous? Yes and Java Message Service was build for that.

So, my idea was to build a tool, which would grab the tick data from provider and pass it to JMS. Retrieval tool doesn’t care what happens next – disk crash, heavy processing of the data or save on the storage. On the other end – JMS can have one or many clients and it will pass all incoming information. If something happens to a client during the transfer of the information – JMS will take care of it – it will wait for fallen clients by preserving incoming information.

If you are looking for how to stick together JMS, ActiveMQ, Spring, Hibernate, JPA and Maven, then this code can help you as well.

Comments (2)