Archive for R-language

Trading volume forecast for an illiquid stock

When dealing with transaction cost analysis, a stock’s volume is assumed to be stable or foreseeable.  However, there is different picture, then we are dealing with an illiquid stock.

It is relatively easy to forecast the volume of a liquid stock, because trading volume has high autocorrelation – the volumes at t and t+1 are correlated. For example, let’s take a look at BPN Paribas stock  autocorrelation figure:

Photobucket

X axis shows the lag between the days and Y axis shows percentage of the correlation. For BNP Paribas stock we have 60 % correlation between t and t+1 and 30 % between t and t+2.

Now, let’s look what is autocorrelation  of an illiquid stock:

Photobucket

The figure above shows, that autocorrelation for this emerging market stock is zero. That means, that we can’t forecast tomorrow’s volume, based on today’s volume.

Imagine, that a portfolio manager has to liquidate 100 000 of illiquid stock where the median of daily volume is 4000 and participation rate has to be maximum 20 %. How many days it will take to liquidate the position?

Because the daily volume of the stock is very volatile, we need more randomness in our forecast. To do that, we can use bootstrap – let’s take the last 250 data-points of the volume and generate 10 000 or 100 000 new time series. Once the have a bunch of new time series, let’s check how many days it would take to liquidate 100 000 stock position in each. Finally, we collect the numbers of the days needed for liquidation  and form a new vector. The result is following:

Photobucket

The histogram above shows, that it will take maximum 80 days to liquidate 100 000 of illiquid stock with 95% confidence.

?View Code RSPLUS
#volume - vector of the stock's volume
#shareNumber - number of the shares to liquidate
#loop - number of the time-series to be created
#participationRate - what is going to be participation rate 
simNumberOfDays<-function(volume,shareNumber,loop=10000,participationRate=0.05)
{
  rez=matrix(nrow=loop)
  for(i in 1:loop)
  {
    x=nasa[sample(NROW(volume))]
    y=cumsum(x*participationRate)
    rez[i,]=which(y>shareNumber)[1]
  }
  return(rez)
}
rez=simNumberOfDays(illiquidStock,100000,10000,0.2)

Comments

How big block trades affect stock market prices?

I will be giving a presentation on “Optimal transaction cost” in Vilnius on  16  August. While preparing the presentation and looking for an optimal execution solution, a natural question arises: does the size of the trade affect stock market price? I’m sure, you would say 100 % yes. Well, you would be right, but what is the scale of such effect? Is it possible to profit from execution of the big block trades?

Such test is not trivial and to conduct it, you need high frequency data, which is messy in most of the cases. For testing purpose I chose BNP Paribas stock from February 2011 to May 2011. Initially, I had more than 460 k. trades and more than 320k. quotes. However, the data was filtered by buyers initiated trades. To find buyers initiated trades, I used Lee-Ready Rule – short description can be found here on page 2. I found about Lee – Ready rule while reading Maxdama last post and a damn good summary (check page 42).

The first chart below shows the average return  one trade later (within seconds in most of the cases), when big or small trade was done. X axis represents difference between the trade and following trade, Y axis represents the trade size and the dot size represents number of trades within that cluster of volume. As you can see, small trades add 0.0004% to the price, while big ones (more than 980 of shares) increase the price on average 0.0007%

Photobucket

The next figure shows average return one minute later. This time the different between small trades and big one are almost3 times!

Photobucket

While we can see, that stock market prices are affected by big blocks, there’s no easy way to profit from it. You have to take into account bid/ask spread, plus you are becoming liquidity demander when liquidity is dry. On other end, this test shows the cost for each volume cluster and this cost can be used when choosing an optimal strategy for portfolio/stock liquidation.

Comments

Plotting git statistics

Here’s a funny story – friend of my, avid gamer at that time, was going downhill on a bicycle when wonderful idea flashed his mind: I need to save the current status… Just in case if I crash, I will start again from the top of the hill.

If you are a developer (quantitative or software), then you can use such marvelous feature. I use GitHub for my software and data mining or quantitative projects. Yesterday I came up with an idea to check my statistics of git commits. You can easily find ready to use software, but I was eager to extend my knowledge about git features and keep my machine clean.

I built two scripts – one is Linux shell script to get the data and another one is to plot the data in R.
getstats.sh:

git log master --shortstat --pretty="format: %ai"|
sed -e 's/\+[0-9]*/,/g'|sed ':a;N;$!ba;s/ ,\n/,/g'|
sed 's/ files changed//g'|sed 's/ insertions(,)//g'|
sed 's/ deletions(-)//g' >gitstats.csv

This part of the code: git log master –shortstat –pretty=”format: %ai” dumps all necessary data and the rest of the code makes it ready for R consumption. I found this page helpful, when I tried to format the dump.

gitStats.R:

?View Code RSPLUS
require(ggplot2)
require(xts)
setwd('/home/git/Rproject/gitStats/') 
Sys.setenv(TZ="GMT")
tmp=as.matrix(read.table('gitstats.csv',sep=',',header=FALSE))
commits=xts(cbind(as.double(tmp[,2]),as.double(tmp[,3]),as.double(tmp[,4])),order.by=as.POSIXct(strptime(tmp[,1],'%Y-%m-%d %H:%M:%S')))
 
colnames(commits)=c('Changes','Insertion','Deletion')
tmp=data.frame(Date=as.Date(index(commits)),Changes=as.numeric(commits$Changes),Insertion=as.numeric(commits$Insertion),Deletion=as.numeric(commits$Deletion))
tmp=melt(tmp,id.vars=c('Date'))
png('gitStats.png',width=500)
print(ggplot(tmp,aes(Date,value,color=variable))+geom_jitter(alpha=.65,size=3))
dev.off()
 
#############daily aggregated data##############
factor=as.factor(format(index(commits),'%Y%m%d'))
tmp=cbind(as.numeric(aggregate(commits$Changes,factor,sum)),as.numeric(aggregate(commits$Insertion,factor,sum)),as.numeric(aggregate(commits$Deletion,factor,sum)))
tmp=data.frame(unique(as.Date(index(commits))),tmp)
colnames(tmp)=c('Date','Changes','Insertion','Deletion')
tmp=melt(tmp,id.vars=c('Date'))
png('gitStats2.png',width=500)
print(ggplot(tmp,aes(Date,value,color=variable))+geom_jitter(alpha=.65,size=3))
dev.off()

R script generates this nice plot below:

Photobucket

What does it shows? It shows my activity in master repository. There is two projects – one was suspended in March and another one is under heavy development. As you can see, there was a lot of insertion when the last project was committed and since then numbers of insertion declined. I will come back, when I generate more data.
Do you track your git activity?

Source code

Comments

Artificial intelligence in trading: k-means clustering

There is many flavors of artificial intelligence (AI), however I want to show practical example of the cluster analysis. It is very applicable in finance. For example, one of stylized facts of volatility is, that it moves in clusters, meaning that today’s volatility will be more likely as yesterday’s volatility. To gauge these moves you can use hidden Markov chain (complicated method) or k-means (probably to simplified). However, GARCH model successfully exploits this stylized fact to make prediction of tomorrow’s volatility (it takes into account another fact as well – volatility is mean reverting process).

K-means is based on unsupervised learning – you give the data and k-means decides how to classify it. The idea is to split data into clusters based on cluster center and assign each point to nearest center.  There is drawback with such approach – the algorithm tries to establish the centers of  clusters with initial data set. If the data is very noisy and the centers are not stable, then every try will give you different results.

As you probably know, the distribution of financial data is very unstable. How to tackle this problem? We should be looking at daily returns instead of prices. The figure below shows daily returns of SPY stock.

?View Code RSPLUS
setwd('/home/git/Rproject/kmeans/')
require(quantmod)
require(ggplot2)
Sys.setenv(TZ="GMT")
getSymbols('SPY',from='2000-01-01')
 
x=data.frame(d=index(Cl(SPY)),return=as.numeric(Delt(Cl(SPY))))
png('daily_density.png',width=500)
ggplot(x,aes(return))+stat_density(colour="steelblue", size=2, fill=NA)+xlab(label='Daily returns')
dev.off()

Photobucket

I was ready to show another trick – how to neutralize long tails by replacing existing distribution with uniform distribution, but quick test revealed, that this leads to uninterpretable results.

OK, lets move further – how many clusters should we have? Can AI give us a clue? Of course, but keep in mind that then your future decision will be anchored.

?View Code RSPLUS
nasa=tail(cbind(Delt(Op(SPY),Hi(SPY)),Delt(Op(SPY),Lo(SPY)),Delt(Op(SPY),Cl(SPY))),-1)
 
#optimal number of clusters
wss = (nrow(nasa)-1)*sum(apply(nasa,2,var))
for (i in 2:15) wss[i] = sum(kmeans(nasa, centers=i)$withinss)
wss=(data.frame(number=1:15,value=as.numeric(wss)))
 
png('numberOfClusters.png',width=500)
ggplot(wss,aes(number,value))+geom_point()+
  xlab("Number of Clusters")+ylab("Within groups sum of squares")+geom_smooth()
dev.off()

Photobucket

The figure above implies, that we should have more than 15 clusters for financial data. Well, for sake of simplicity and education purpose lets use only 5.

?View Code RSPLUS
kmeanObject=kmeans(nasa,5,iter.max=10)
kmeanObject$centers
autocorrelation=head(cbind(kmeanObject$cluster,lag(as.xts(kmeanObject$cluster),-1)),-1)
xtabs(~autocorrelation[,1]+(autocorrelation[,2]))
 
y=apply(xtabs(~autocorrelation[,1]+(autocorrelation[,2])),1,sum)
x=xtabs(~autocorrelation[,1]+(autocorrelation[,2]))
 
z=x
for(i in 1:5)
{
  z[i,]=(x[i,]/y[i])
}

The code above actually shows, how to run k-means clustering in R. The first line runs the sorting and the second shows clusters’ centroids:

High Low Close
1 0.0388 -0.0094 0.0313
2 0.0049 -0.0050 0.0006
3 0.0143 -0.0038 0.0106
4 0.0038 -0.0148 -0.0103
5 0.0053 -0.0348 -0.0280

So, we have 5 clusters: 1. extremely positive day, 2. flat day, 3. positive day and 4,5 are clusters with negative outcome.
The third and fourth lines in the code above checks and prints autocorrelation between today(N0) and tomorrow(N1):

1 2 3 4 5
1 11 24 29 21 12
2 16 991 288 351 42
3 17 338 144 168 28
4 27 310 202 207 32
5 26 24 33 31 23

If you prefer percentages instead of plain numbers, the following table gives you that:

1 2 3 4 5
1 0.11 0.25 0.30 0.22 0.12
2 0.01 0.59 0.17 0.21 0.02
3 0.02 0.49 0.21 0.24 0.04
4 0.03 0.40 0.26 0.27 0.04
5 0.19 0.18 0.24 0.23 0.17

How to read such tables? Lets take for example line 2. The first table says, that the centers of the cluster are following: 0.0049;-0.0050;0.0006, meaning that during such day, the price of the asset is moving in very narrow range. Now, the table 2 or 3 shows, what are the chances for the next day (N1). Here is only 1 % chance, that following day will be extremely negative or positive (1 and 5 columns), 59 % chance, that it will be as today (N0) or it will be mild volatility with positive or negative outcome (3 and 4 columns). Put it shortly – if volatility today is very low, then most likely it will be tomorrow.

For further research I would advise to increase the number of clusters and check what are the results. On the same vein IntelligentTradingTech made a post while back.

The source code can be found here.

Comments (4)

timezone issue in R

While investigating Intraday patterns in FX returns and order flow paper I have faced the problem with timezone. I had 3 data sources with different timezones (GMT, CET, CEST). Most confusing thing was, that I didn’t know, how to deal with summer time.
But why did I have the data with summer time in the first place?  Well, I use IBrokers package to get latest Forex data and turns out, that it is impossible to specify timezone parameter within reqHistoricalData function. Once the data is received, it assigns default timezone of R (in my case GMT), but the requested data comes with OS timezone (CET/CEST in my case). It took for while to realize that, but the real challenge was how to convert CET/CEST to GMT.

The answer – don’t use CET/CEST, but instead of that use something like ‘Europe/Paris’ or ‘Europe/Berlin’. This approach takes into account summer time issue and you don’t need to worry about it. Once you have the data in ‘Europe/Paris’ or ‘Europe/Berlin’ format you can easily convert it:

?View Code RSPLUS
Sys.setenv(TZ="Europe/Paris")
eur.usd=reqHistoricalData(tws,currency,whatToShow='MIDPOINT',barSize='1 hour')
Sys.setenv(TZ="GMT")
eur.usd=xts(Op(eur.usd),tz='GMT')

or you can change the index:

?View Code RSPLUS
 format(eur.usd, tz="GMT",usetz=TRUE)

Comments

Transaction cost analysis and pre-trade analysis

Transaction cost analysis (TCA) is the framework to achieve best execution in trading context. TCA can be split into three groups: pre-trade analysis, intraday analysis, and post-trade measurement.

Pre-trade analysis allows us to get insight about the future volatility of the price, forecast intra-day and daily volumes, market impact. It evaluates all strategies and advises the strategy that is most consistent with manager preferences for the risk.

Intraday analysis is real time analysis, where the system ensure that the strategy performs in line with the forecast.

Post-trade analysis measures the implementation of the investment decision to ensure, that pre-trade models are accurate and best execution is delivered.

If you want to learn more about the transaction cost analysis, I highly recommend Optimal Trading Strategies: Quantitative Approaches for Managing Market Impact and Trading Risk by Robert Kissell. The book not only covers all type of mentioned analysis, but also considers the practical aspects of the implementation of trading strategy and algorithms. The book includes very interesting part about Principal Bid Transactions or blind bids.

To build a forecast model we need some input parameters. Pre-trade analysis can be split into two parts – fundamental part and statistically based part. It is much ease to derive fundamental facts such as market capitalization, company profile, the sectors in which company operates, than statistically based parameters. For the latter I’m going to use R and the raw stock price data to derive it. The former can easily be found on Internet, Bloomber or Reuter.

The first input parameter that I’m are going to derive is average daily trading volume of the traded security. It is very easy to obtain it – only daily data is necessary. I used 20 days rolling window to get average volume. Here’s an example of IBM security:

Photobucket

The graph shows, how the 20 days average volume evolved since last September – the minimum was 3.5 millions and the maximum was 6 millions traded a day. But the average can be misleading – it takes into account last 20 days and then it derives one number. In most of the case, we are going to deal with short time prediction such as 1 day or 1 hour, so approximation of 20 days does not make a lot sense.
One of possible solutions would be using confidence intervals estimate daily volume. 95% confidence interval is wide used in finance – I will use the same interval answering the following question: based on this interval, what volume will be traded next day.

Photobucket

The density diagram shows distribution of daily volume of IBM security. As you can see most of the days the volume was above 3 millions. However, 5% of the days the volume was below 3 millions. Based on this diagram, we can predict, that tomorrow’s volume will be higher than 3 millions.

Before we finish with the daily volume, we have to test for weekly seasonality. If there is weekday seasonality, then the volume forecast has to be adjusted.

Photobucket
The chart above clearly indicates, that the traded volume on Monday is below (~5%) the day average. There is increase in the volume on Friday, but the significance is under question. Let’s check density diagram to get rid of any doubt about the volatility on Monday and Friday.

Photobucket

The density diagram above shows, that Monday’s peak and the body are shifted to the left. On the other end we can see, that Friday’s body is aligned with others days, only it has a fat tail. Based on this diagram we can eliminate Friday’s volume adjustment and apply Monday’s adjustment only.

Once we finished with daily data we can move to intra-day. Then liquidating (or acquiring) a position in a security, the position has to be sliced into smaller parts to avoid influence of the market and to hide the intentions. For this reason, intra-day volume patterns have to be known. With that in mind, let’s look at how the volume is distributed in relation to the trading hour.

Photobucket

The graph above shows hourly volume pattern, where the volume is grouped by hour. The black dots indicate the median of the hour. Please keep in mind, that the trading starts at 9:30 and the first trading hour has only 30 minutes (if you want align the first hour to the others, then you need to multiply the volume of the first hour by two). As we can see, the first and the last are most traded and the volume drops in the middle of the day.

The next useful thing then liquidating a big position is average trade size. We need to know, how behaves average Joe and what he trades.

Photobucket

The chart above shows all trades grouped by hour – the black dots indicate median of the trades. The following table is supplementary to the chart – here you find median of the hour.

Hour 10 11 12 13 14 15 16
Volume 842.5 565.5 394.0 300.0 297.0 369.5 708.5

Once again, average trade size is much higher in the first and last hours and it drops ~2.5 times during the trading session.

Photobucket

The final chart shows the volatility grouped by hour. There is a lot jittery, then the market opens, it becomes calmer during the lunch time and slightly increases then the market closes. These are the numbers for each hour:

Hour 10 11 12 13 14 15 16
Volatility

(standard deviation)

0.16% 0.09% 0.08% 0.05% 0.05% 0.06% 0.07%

In this article we analyzed a list of potential input parameters for pre-trade analysis and further forecast. The list is not static and can be extended with supplementary parameters, such as bid-ask spread distribution and etc. The next step is to aggregate these parameters and build the models to forecast volatility and volume.

Comments (2)

Book: ggplot2 by Hadley Wickham

All my recent plots are built using ggplot2 package. I don’t know if my dear readers have noticed the difference, but from my point of view, ggplot2 allows to create nice looking and aesthetics plots. I was using this package before, but the real boost came after reading this book: ggplot2: Elegant Graphics for Data Analysis (Use R).
You can find a decent ggplot2 manual on internet, but the book is special, because it explains the grammar and the syntax. Then you know that, it is much easer to tackle various problem, for example – how to set transparency of the point, fill the color of the density plot and etc. Check my recent post as an example.

Comments (3)

Politinis savivaldybių žemėlapis

I’m sorry for my English readers – this post is about Lithuania political map and it is written in my native Lithuanian language. By the way, R-Language has been used and the code is shared on git.

Neseniai viesai.lt rašė apie savivaldybių efektyvumą ir 2009 m. išlaidas. Nepaisant to, kad viešos informacijos apie savivaldybes, jų valdymą ir politikus yra labai mažai, papildžiau viesai.lt tyrimą naujais duomenimis ir vizualine medžiaga.

Atlikdamas šį tyrimą, rėmiausi prielaida, kad meras yra atsakingas už visus spendimus savivaldybėje ir turi galutinį žodį sprendžiant visus klausimus ir yra atsakingas už išlaidas. Kadangi visi Lietuvos merai yra partiniai, savivaldybės yra valdomos tų partijų, kurioms priklauso merai. Pradžioje pažiūrėkime, kokio dydžio savivaldybės (pagal gyventojų skaičių) yra valdomos partijų.

Photobucket

Pirmame paveikslėlyje matome kiek savivaldybių priklauso tam tikrai partijai (arba kokiai partijai priklauso meras) ir gyventojų skaičių savivaldybėse. TS turi dvi savivaldybes, kurios yra didžiausios Lietuvoje, LSDP turi didžiausią kiekį jai atstovaujančių merų. Partijos, kurios turi tik po vieną atstovaujantį merą, buvo eliminuotos iš šio grafiko.

Photobucket

Antrame paveikslėlyje atvaizduota suminė informacija apie gyventojus ir partijas. TS partija atstovauja didžiausią gyventojų skaičių savivaldybėse, bet eliminavus dvi didžiausias savivaldybes, skaičius būtų panašus į LSDP. Partijos, kurios turi tik po vieną atstovaujantį merą, buvo eliminuotos iš šio grafiko.

Photobucket

Trečiame paveikslėlyje partijos sugrupuotos pagal jų atstovaujamą politinę orientaciją.

Photobucket

Viršuje esantis paveikslėlis rodo savivaldybių išlaidų tankumą (pgl. viesai.lt išlaidų skaičiavimo metodiką) . Išlaidų vidurkis yra tarp 8 ir 10 proc., todėl viesai.lt pateiktas  blogiausių savivaldybių palyginimas su Klaipėdos m. išlaidomis yra labai optimistinis.

Photobucket

Savivaldybių suskirstymas pagal partijas atskleidžia vieną galimą faktą – mažai savivaldybių turinčios partijos išleidžia jų valdymui daugiau. Paveikslėlyje, kuris yra viršuje, juodais taškais yra pažymėti išlaidų vidurkiai. Kaip matome, LLRA ir TT, kurios turi po dvi savivaldybes, išlaidų vidurkiai yra apie 14.5 proc., kai visų savivaldybių vidurkis yra mažesnis nei 10 proc. Šį faktą reiktų patikrinti įtraukiant kitų metų duomenis, nes dėl mažos imties galimas neteisingas duomenų interpretavimas.

Photobucket

Paskutiniame grafike duomenys atskleidžia seniai žinomą tiesą – mažos savivaldybės yra brangus malonumas. Matome, kaip išlaidų kreivė mažėja gyventojų skaičiui didėjant ir ties 10 (22 000 gyventojų) pasiekia išlaidų vidurkį. Įdomu tai, kad savivaldybėms didėjant, išlaidos nesikeičia.

Tyrimo išeities kodą ir csv failą su duomenimis galima rasti gitHub.

Comments (1)

Correlation network

I came up with an idea to draw correlation network to get a grasp about relationship between a list of stocks. An alternative way to show correlation matrix would be head map, which can have limitations with big matrices (>100).
Unfortunately,  ggplot2 package doesn’t have a easy way to draw the networks, so I was left with igraph or network. I tried both, but somehow chose igraph. If you want to master either package I highly recommend to start from theoretical part – it is very well written and it will save your time trying to understand the package’s conception.

Here is correlation matrix of stocks, which correlation coef. is more than 0.5:
correlation_network

?View Code RSPLUS
require(xts)
require(quantmod)
require(igraph)
cor_mat<- matrix( runif(100), nr=10 )
cor_mat[ lower.tri(cor_mat, diag=TRUE) ]<- 0
cor_mat[ abs(cor_mat) < 0.5]<- 0
 
graph <- graph.adjacency(cor_mat>0.5, weighted=TRUE, mode="upper")
E(graph)$weight<-t(cor_mat)[abs(t(cor_mat))>0.5]
E(graph)[ weight>0.7 ]$color <- "black" E(graph)[ weight>=0.65 &amp; weight<0.7 ]$color <- "red" E(graph)[ weight>=0.6 &amp;weight<0.65 ]$color <- "green" E(graph)[ weight>=0.55 &amp;weight<0.6 ]$color <- "blue"
E(graph)[ weight<0.55  ]$color <- "yellow"
V(graph)$label<- seq(1:10)#V(graph)$name
graph$layout <- layout.fruchterman.reingold
factor<-as.factor(cut(E(graph)$weight*10,c(4,5,6,7,8),labels=c(1,10,20,30))) png('corr_network.png',width=500) plot(decompose.graph(graph)[[which.max(sapply(decompose.graph(graph), vcount))]],edge.width =as.numeric(factor)*1.5,frame=T) legend("bottomleft", title="Colors", cex=0.75, pch=16, col=c("black", "blue","red", "green","pink"), legend=c(">70%", "65-70","60-65","55-60","50-55"), ncol=2)
dev.off()

The code can be found on github.

Comments (10)

Interesting volatility measurement, part 2

A few weeks ago I have mentioned about an interesting volatility prediction. It is based on two periods of historical volatility (standard deviation). The remaining question was – does it really works? I could not give the answer, because I didn’t have VIX futures data at that time. Later on, I was contacted by Brian G. Peterson, who provided necessary data to finish this test. By the way, I just found, that CBOE shares VIX futures data on its website.

Now I want you to show, what are returns of VIX futures for the next 3 days, then historical volatility ratio of 3 days vs 10 days is less than 0.25:

Photobucket

?View Code RSPLUS
 
Sys.setenv(TZ="GMT")
require('xts')
require('quantmod')
require('blotter')
require('PerformanceAnalytics')
 
tmp<-as.matrix(read.table('tickers/various_day_close/VIXc1.csv',sep=',',header=TRUE))
vix<-as.xts(as.double(tmp[,9]),order.by=as.POSIXct(strptime(tmp[,2],'%d-%b-%Y'),tz='GMT'))
vix<-(vix[!is.na(vix)])
colnames(vix)<-c('Close')
 
 
tmp<-as.matrix(read.table('tickers/various_day_close/ESc1.csv',sep=',',header=TRUE))
es<-as.xts(as.double(tmp[,9]),order.by=as.POSIXct(strptime(tmp[,2],'%d-%b-%Y')))
es<-(es[!is.na(es)])
colnames(es)<-c('Close')
 
#-----------------data end-----------------
 
 
#-----------------signal-------------------
es.delta<-Delt(Cl(es))
delta<-Delt(Cl(vix))#Front contract
 
#Historical volatility during 3 and 10 days
short.vol<-as.xts(rollapply(es.delta,3,sd,align='right'))
long.vol<-as.xts(rollapply(es.delta,10,sd,align='right'))
 
past.vol<-short.vol/long.vol
future.vol<-lag(past.vol,-3)
future.delta<-lag(vix,-3)/vix-1
 
signal<-ifelse(past.vol<0.25,1,0)
 
#here we see, increase in historical volatility
summary(as.double(future.vol[index(signal[signal!=0])]))/summary(as.double(past.vol[index(signal[signal!=0])]))
 
#-----------------signal end-------------------
 
#--------------blotter code------------------
symbols<-c('vix')
 
initDate=time(get(symbols)[1])
initEq=50000
rm(list=ls(envir=.blotter),envir=.blotter)
ltportfolio='volatility'
ltaccount='volatility'
initPortf(ltportfolio,symbols, initDate=initDate)
initAcct(ltaccount,portfolios=c(ltportfolio), initDate=initDate,initEq=initEq)
currency("USD")
stock(symbols[1],currency="USD")
 
signal<-signal[index(vix)]
 
signal[is.na(signal)]<-0
 
counter<-0 #date counter - exit on 3th day
 
for(i in 2:length(signal))
{
	currentDate= time(signal)[i]
	equity = initEq #getEndEq(ltaccount, currentDate)
	position = getPosQty(ltportfolio, Symbol=symbols[1], Date=currentDate)	
	print(position)
	print(currentDate)
	if(position==0 &counter==0)
	{		
		#open a new position if signal is >0
		if(signal[i]>0)
		{
			print('open position')
			closePrice<-as.double(get(symbols[1])[currentDate])
			print(closePrice)
			unitSize = as.numeric(trunc((equity/closePrice)))
			print(unitSize)
			commssions=-unitSize*closePrice*0.0003
			addTxn(ltportfolio, Symbol=symbols[1],  TxnDate=currentDate, TxnPrice=closePrice, TxnQty = unitSize , TxnFees=commssions, verbose=T)
			counter<-1
		}
 
	}
	else
	{
		#position is open. If signal is 0 - close it.
		if(position>0 & as.integer(signal[i])==0 &counter>=3)
		{
			position = getPosQty(ltportfolio, Symbol=symbols[1], Date=currentDate)
			closePrice<-as.double(get(symbols[1])[currentDate])#as.double(get(symbols[1])[i+100])
			commssions=-position*closePrice*0.0003
			addTxn(ltportfolio, Symbol=symbols[1],  TxnDate=currentDate, TxnPrice=closePrice, TxnQty = -position , TxnFees=commssions, verbose=T)
			counter<-0
		}
		else
			counter<-counter+1
 
	}	
	print('>>>>>>>>>>>>')
	updatePortf(ltportfolio, Dates = currentDate)
	updateAcct(ltaccount, Dates = currentDate)
	updateEndEq(ltaccount, Dates = currentDate)
}
rez1<-(getPortfolio(ltaccount))
 
#--------------blotter code end------------------
 
#----------------results------------------------
png('vix_front.png',width=650)
#net profit - commissions, slipage excluded
chart.TimeSeries(cumsum(rez1$symbols$vix$txn[,7]),main='VIX front contract')
dev.off()
#----------------results end------------------------

The graph shows, that this strategy is pure random or just follows VIX index. Now let’s see, what are returns of this strategy, if S&P500 futures are used instead of VIX.

Photobucket

?View Code RSPLUS
 
signal<-ifelse(past.vol<0.25,1,0)
#signal<-signal[index(es)]
 
 
 
#------------------------blotter code-----------------------
symbols<-c('es')
 
initDate=time(get(symbols)[1])
initEq=15000
rm(list=ls(envir=.blotter),envir=.blotter)
ltportfolio='volatility'
ltaccount='volatility'
initPortf(ltportfolio,symbols, initDate=initDate)
initAcct(ltaccount,portfolios=c(ltportfolio), initDate=initDate,initEq=initEq)
currency("USD")
future(symbols[1],currency="USD",multiplier=50,1/4)
 
signal[is.na(signal)]<-0
 
counter<-0
 
for(i in 2:length(signal))
{
	currentDate= time(signal)[i]
	equity = initEq #getEndEq(ltaccount, currentDate)
	position = getPosQty(ltportfolio, Symbol=symbols[1], Date=currentDate)	
	print(position)
	print(currentDate)
	if(position==0 &counter==0)
	{		
		#open a new position if signal is >0
		if(signal[i]>0)
		{
			print('open position')
			closePrice<-as.double(get(symbols[1])[currentDate])
			print(closePrice)
			unitSize = 1#as.numeric(trunc((equity/closePrice)))
			print(unitSize)
			commssions=-2
			addTxn(ltportfolio, Symbol=symbols[1],  TxnDate=currentDate, TxnPrice=closePrice, TxnQty = unitSize , TxnFees=commssions, verbose=T)
			counter<-1
		}
 
	}
	else
	{
		#position is open. If signal is 0 - close it.
		if(position>0 & as.integer(signal[i])==0 &counter>=3)
		{
			position = getPosQty(ltportfolio, Symbol=symbols[1], Date=currentDate)
			closePrice<-as.double(get(symbols[1])[currentDate])#as.double(get(symbols[1])[i+100])
			commssions=-2
			addTxn(ltportfolio, Symbol=symbols[1],  TxnDate=currentDate, TxnPrice=closePrice, TxnQty = -position , TxnFees=commssions, verbose=T)
			counter<-0
		}
		else
			counter<-counter+1
 
	}	
 
	updatePortf(ltportfolio, Dates = currentDate)
	updateAcct(ltaccount, Dates = currentDate)
	updateEndEq(ltaccount, Dates = currentDate)
}
rez1<-(getPortfolio(ltaccount))
#-------------------------results---------------------
#net profit
png('vix.png',width=650)
chart.TimeSeries(cumsum(rez1$symbols$es$txn[,9]),main='ES future contract')
dev.off()

Well, that is exact opposite of expectations – if we expect volatility increase, as it was described in the first post, then the returns of S&P index have to be negative in long run.

From the beginning I suspected, that it has more to do with standard deviation formula and less with forecast.
Now funny part – I generated 2500 random returns and got median 0.9930  and mean 1.6360 for all days. Then I took all days, when buy signal suppose to be generated and guess what mean did I get? Median was 4.3170  and mean 6.3450. Once again, significant difference but on random data.

Source code on github

Comments (5)

« Previous Page · Next Page »