Archive for R-language

R package Blotter

How many times have you been disappointed by nice trading system, because neither trading cost or slippage or bid/ask spread were included into back-test results? Did you find difficult to back-test a portfolio in R or many portfolios with different stocks? Blotter package is supposed to solve these problems.

In really – it is complicated. I spent a couple of days to start using it. There was one bug in the demo example, but the rest was in my code. You should remember:
1. Time zone must be specified:

Sys.setenv(TZ="GMT")

2. xts object must be created with explicit index class:

getSymbols('^GSPC',from='2000-01-01',<strong>index.class=c("POSIXt","POSIXct")</strong>)

3. When the test is conducted all blotter related values are written into .blotter environment. If you want repeat the same test, then you need to get rid of all .blotter values. So, you have to run something like this:

rm(list=ls(envir=.blotter),envir=.blotter)

You can find the author’s comment about this issue here: http://n4.nabble.com/Blotter-package-problem-with-example-tp1018634p1572911.html
4. It is slow. If you want to find the fittest rule from a bunch of them – try avoid using blotter, instead, use pure cumulative return and later on, include blotter.

Let’s test this rule with blotter as an example:
short SPY if today’s low is higher than yesterday’s close and close next day.

require(xts)
require(quantmod)
Sys.setenv(TZ="GMT")
initDate='2000-01-01'
getSymbols('SPY',from=initDate,index.class=c("POSIXt","POSIXct"))
SPY<-adjustOHLC(SPY,use.Adjusted=T)
 
#yesterday's price
tmp<-lag(SPY,1)
 
# if today's low is higher than yesterday's close 1, else 0
signal<-ifelse(Lo(SPY)>Cl(tmp),1,0)
signal[1]<-0
 
#let's plot cumulitative return to make sure, that we are on the right path
tmp<-lag(Delt(Cl(SPY)),-1)
tmp[length(tmp)]<-0
plot(cumprod(signal*tmp+1))
 
SPY<-Cl(SPY)
symbols<-c('SPY')
 
rm(list=ls(envir=.blotter),envir=.blotter)
initPortf(ltportfolio,symbols, initDate=initDate)
initAcct(ltaccount,portfolios=c(ltportfolio), initDate=initDate,initEq=initEq)
 
signal<-signal['2000-02-01::']
 
for(i in 1:length(signal))
{
	currentDate= time(signal)[i]
	equity = getEndEq(ltaccount, currentDate)
	position = getPosQty(ltportfolio, Symbol=symbols[1], Date=currentDate)	
 
	if(position==0)
	{
		#open a new position if signal is >0
		if(signal[i]>0)
		{
			closePrice<-as.double(Cl(SPY[currentDate]))
			unitSize = as.numeric(trunc((equity/closePrice)))
                        # 5$ per transaction
			addTxn(ltportfolio, Symbol=symbols[1],  TxnDate=currentDate, TxnPrice=closePrice, TxnQty = -unitSize , TxnFees=5, verbose=T)
		}
 
	}
	else
	{
		#position is open. If signal is 0 - close it.
		if(as.double(signal[i])==0 &amp;&amp; position<0)
		{
			position = getPosQty(ltportfolio, Symbol=symbols[1], Date=currentDate)
			closePrice<-as.double((Cl(SPY[currentDate])))#as.double(get(symbols[1])[i+100])
 
                        # 5$ per transaction
			addTxn(ltportfolio, Symbol=symbols[1],  TxnDate=currentDate, TxnPrice=closePrice, TxnQty = -position , TxnFees=5, verbose=T)
 
		}
 
	}
	updatePortf(ltportfolio, Dates = currentDate)
	updateAcct(ltaccount, Dates = currentDate)
	updateEndEq(ltaccount, Dates = currentDate)
 
	equity = getEndEq(ltaccount, currentDate)
 
}
plot(getAccount(ltaccount)[["TOTAL"]]$End.Eq)

The result (it will take time to get it…):
Photobucket

Comments

Predicting April month return

Bespoke blogged about average monthly returns of the DJI and emphasized April. Before jumping on that information, let’s check some weak points.
In that post, only average returns are presented. We need at least extreme points (min;max) and confidence ranges. Second problem – the normal market have upward trend and we need to get rid of that. To do so, either we have subtract the  rolling mean (a tough way) or use logarithmic prices (that’s the easy way!).

Instead of DJI, I took S&P500 from 1950 until now.

Photobucket

Sure, average return in April is above 0, but based on historical data, negative return is possible. I conducted t-test, where null hypothesis was, that average return is equal to zero. I got p-value of 0.0042, so null hypothesis can be rejected (return is above 0).

The graph below shows cumulative return of investment, investing only in April. Keep in mind, that this is log scale and real return would be higher.
Photobucket

Conlusion: based on this data, expect positive return in April.

R code

require(xts)
require(quantmod)
require(ggplot2)
getSymbols('^GSPC',from='1950-01-01')
return<-Delt(Cl(to.monthly(log(GSPC))))
return[1]<-0
temp<-as.double(format(index(return),'%m'))
 
temp<-data.frame(as.double(return),as.numeric(temp))
qplot(factor(as.numeric(temp[,2])),as.double(return),data=temp,geom = "boxplot",ylab='Returns',xlab='Months')
 
t.test(return[which(format(index(return),'%m')=='04')])
plot(cumprod(return[which(format(index(return),'%m')=='04')]+1),main='Cumulative return, 1950-present')

Comments

Returns on Easter week and one week after

Inspired by CXO group report, I did a rerun of the same strategy on my data. Easter’s dates can be find at wikipedia. Overall, my results are similar to CXO group’s results.

In the graph below, I plotted daily returns on Easter week (Monday to Thursday) from 1982 to 2009. I prefer this way of showing things, where the range, minimum, maximum and mean of returns are presented.
It is clear, that only returns on Thursdays are above zero and t-test confirms that:
t = 2.235, df = 27, p-value = 0.03389. The rest is close to random.

Photobucket

The graph below shows daily returns one week after Easter holidays. Although Monday looks like negative day, but it lacks significance:
t = -1.1517, df = 27, p-value = 0.2595 (should be less that 0.1).
The rest is noise.

Photobucket

In summary, only returns on Thursdays have positive bias.

#R-Language code to repeat this test.
require(quantmod)
require(xts)
easter<-as.matrix(read.table(file='data/easter.csv'))
getSymbols('^GSPC',from='1980-01-01')
 
GSPC.delta<-Delt(log(Cl(GSPC)))
GSPC.delta<-Delt(Cl(GSPC))
 
#returns during the week before Easter
#returns on Thursdays
nasa<-data.frame(as.double(GSPC.delta[(as.Date(easter,'%m/%d/%y')-3)]),4)
colnames(nasa)<-c('ret','weekday')
 
#Monday to Wednesday
for(i in 1:3)
{
tmp<-data.frame(as.double(GSPC.delta[(as.Date(easter,'%m/%d/%y')-(3+i))]),(4-i))
colnames(tmp)<-c('ret','weekday')
rbind(nasa,tmp)
nasa<-rbind(nasa,tmp)
}
t.test(nasa$ret[nasa$weekday==4])
 
require(ggplot2)
qplot(factor(as.numeric(nasa$week)),as.double(nasa$ret),data=nasa,geom = "boxplot",ylab='Returns',xlab='Weekdays')
 
#returns during the week after Easter
 
GSPC.delta<-Delt(Cl(GSPC))
nasa<-data.frame(as.double(GSPC.delta[(as.Date(easter,'%m/%d/%y')+1)]),1)
colnames(nasa)<-c('ret','weekday')
for(i in 2:5)
{
print(i)
tmp<-data.frame(as.double(GSPC.delta[(as.Date(easter,'%m/%d/%y')+i)]),i)
colnames(tmp)<-c('ret','weekday')
rbind(nasa,tmp)
nasa<-rbind(nasa,tmp)
}
t.test(nasa$ret[nasa$weekday==1]);t.test(nasa$ret[nasa$weekday==2])
qplot(factor(as.numeric(nasa$week)),as.double(nasa$ret),data=nasa,geom = "boxplot",ylab='Returns',xlab='Weekdays')

Comments

Strategy: what if SPY & VIX are up?

Recently, I was busy testing the following strategy:

If SPY and VIX daily returns are positive, then short SPY at close and keep it for one day.

The strategy is dump simple and it has very good feature – short side. There are not so many successful short side strategies. For testing purpose I used daily Yahoo data from 1995 until present. For commissions and bid/ask spread I used 5 $ fee per 10 000 $ trade. Here we go:

Photobucket

Annualized Return                     0.0421
Annualized Std Dev                   0.0488
Annualized Sharpe (Rf=0%)        0.8621
t = 3.3787, df = 3811, p-value = 0.0007356

Up to this point is was relatively easy to make a test (the true is, that I spent some time cracking and hacking blotter package, but I will write about it in separate post). My second objective was the improvement of this strategy.

One of the way to understand the strategy is to look how the components are related to each other or correlated. To do that, I took daily returns of SPY and VIX at day 0 and plotted against SPY next day’s (day+1) returns.

Photobucket

What can I tell by looking at this plot? I couldn’t figure out any linear relation between returns of SPY and VIX at day+0 and returns of SPY at day+1. Should I try something like random forest method?

I tried to add some TA flavors, like RSI, but the improvements were not very significant. Simplicity is genius!

Comments (2)

End of the month investment

It is know, that the first day of the month provides bullish edge. According to Quantifiable edges not all the months are equal. So, I made a test on S&P500 index, from January, 1980 until February, 2010. It is true, March isn’t the best month to run this strategy.

Photobucket

Only 3 months have significant results based on p-values:
“month 5, p-value 0.0399233570186162″
“month 7, p-value 0.0466800163648646″
“month 11, p-value 0.0218919220125013″

p.s. if somebody is interested in R-Language code to repeat this test, then let me know.

Comments

Nedarbas Lietuvoje

Šiuo įrašu noriu pademonstruoti R-Language galimybes – kaip panaudojant įvairias duomenų bazes galima sugeneruoti gražius grafikus, kurie turėtų padėti analitikams.
Photobucket

R-Language kodas:

#Lietuvos žemėlapį parsisiunčiam iš čia: http://gadm.org/data/rda/LTU_adm1.RData
#statistinius duomenis iš čia: http://db1.stat.gov.lt/statbank/default.asp?w=1274</code>
 
load(file='LTU_adm1.RData')
tmp&lt;-read.table(file='nedarbas.apskr.csv',sep=';')
nasa.factor&lt;-as.factor(as.numeric(cut(tmp$V2, c(0,9,10,11,12))))
 
levels(nasa.factor)&lt;- c("&lt;9%", "9-10%", "10-11%","11-12")
gadm$col_no&lt;-nasa.factor
 
myPalette&lt;-brewer.pal(4,"Purples")
jpeg(file='nedarbas.Lietuvoje.jpg',width=500)
spplot(gadm, "col_no", col=grey(.9), col.regions=myPalette,main='Nedarbas Lietuvoje')
dev.off()

Šaltinis: http://ryouready.wordpress.com/2009/11/16/infomaps-using-r-visualizing-german-unemployment-rates-by-color-on-a-map/

Comments (1)

Gas price seasonality

Last spring I read “Quantitative Trading” by Ernest P. Chan. In his book, he suggested to buy gas futures contract at the end of February and sell it later, in March. Today, I decided to test this strategy by using R-language.
The most important thing for such investigation is data. For this purpose, I used this public data: www.eia.doe.gov
I made 2 test – for the first one, I used futures contract, which will be settled after 4 months and for the second one, I used gas spot price.
In the first plot, we can see monthly price returns scaled by months (1-12). It is clear, that mean of March’s returns are above 0. What is encouraging in this plot is that the ranges of returns are above 0 as well (meaning, that majority of March’s returns was above 0).
Anova test gives p-value below 0.01 – some months in the group has different mean (that supports seasonality idea). March’s t-test gives these values: t = 2.8064, df = 15, p-value = 0.01329.

Photobucket

I used gas spot prices to generate second plot. It is similar to the first one, except that March’s returns has longer tail. It is worth to note, that September stands as a positive month as well. The statistics for this test are not so strong, as it was in the first example.
P-value of Anova test is only 0.11 and March’s t-test is:
t = 1.3791, df = 15, p-value = 0.1881

Photobucket

R-language code to run these tests:

#------gas contract 4 -----
library(xts)
library(ggplot2)
library(quantmod)
 
tmp<-as.matrix(read.table(file='gas.contract.4.csv',sep=';'))
gas<-as.xts(as.double(tmp[,2]),order.by=as.POSIXct(tmp[,1]))
plot(gas)
gas.monthly<-to.monthly(gas)
gas.monthly.delta<-Delt(Cl(gas.monthly))
gas.monthly.delta[1]<-0
 
gas.factor<-as.double(format(index(gas.monthly.delta),'%m'))
tmp<-data.frame(as.double(gas.monthly.delta),as.numeric(gas.factor))
qplot(factor(as.numeric(gas.factor)),as.double(gas.monthly.delta),data=tmp,geom = "boxplot",ylab='Returns',xlab='Months')
 
anova(lm(as.double(gas.monthly.delta)~gas.factor))
t.test(((gas.monthly.delta[which(format(index(gas.monthly.delta),'%m')=='03')]))[,1])
 
#----gas contract spot -------
tmp<-as.matrix(read.table(file='gas.csv',sep=';'))
gas<-as.xts(as.double(tmp[,2]),order.by=as.POSIXct(tmp[,1]))
 
plot(gas)
colnames(gas)<-c('Close')
 
gas.monthly<-to.monthly(gas)
gas.monthly.delta<-Delt(Cl(gas.monthly))
gas.monthly.delta[1]<-0
 
gas.factor<-as.double(format(index(gas.monthly.delta),'%m'))
tmp<-data.frame(as.double(gas.monthly.delta),as.numeric(gas.factor))
 
qplot(factor(as.numeric(gas.factor)),as.double(gas.monthly.delta),data=tmp,geom = "boxplot",ylab='Returns',xlab='Months')
 
anova(lm(as.double(gas.monthly.delta)~gas.factor))
t.test(((gas.monthly.delta[which(format(index(gas.monthly.delta),'%m')=='03')]))[,1])

Comments (4)