The post Volume-weighted Exponential Moving Average appeared first on DataPunks.

]]>As I was looking to combine this moving average with a volume-weighted version, or simply a weighted moving average, I ran across this Volume-weighted Exponential Moving Average stuff from Peter Ponzo. I gave it a try in R and here’s the code. This requires the ‘xts’ package to be loaded.

```
require(xts)
VEMA <- function(x, volumes, n = 10, wilder = F, ratio = NULL, ...)
{
x <- try.xts(x, error = as.matrix)
if (n < 1 || n > NROW(x))
stop("Invalid 'n'")
if (any(nNonNA <- n > colSums(!is.na(x))))
stop("n > number of non-NA values in column(s) ", paste(which(nNonNA),
collapse = ", "))
x.na <- xts:::naCheck(x, n)
if (missing(n) && !missing(ratio))
n <- trunc(2/ratio - 1)
if (is.null(ratio)) {
if (wilder)
ratio <- 1/n
else ratio <- 2/(n + 1)
}
foo <- cbind(x[,1], volumes, VEMA.num(as.numeric(x[,1]), volumes, ratio), VEMA.den(volumes, ratio))
(foo[,3] / foo[,4]) -> ma
ma <- reclass(ma, x)
if (!is.null(dim(ma))) {
colnames(ma) <- paste(colnames(x), "VEMA", n, sep = ".")
}
return(ma)
}
VEMA.num <- function(x, volumes, ratio) {
ret <- c()
s <- 0
for(i in 1:length(x)) { s <- ratio * s + (1-ratio) * x[i] * volumes[i]; ret <- c(ret, s); }
ret
}
VEMA.den <- function(volumes, ratio) {
ret <- c()
s <- 0
for(i in 1:length(volumes)) { s <- ratio * s + (1-ratio) * volumes[i]; ret <- c(ret, s); }
ret
}
VEMA(1:20, 20:1, ratio=0.1)
VEMA(1:20, 20:1, ratio=0.9)
```

The post Volume-weighted Exponential Moving Average appeared first on DataPunks.

]]>The post Varying Window Length for Linear Models on Stocks appeared first on DataPunks.

]]>I wanted to see if there was a relationship between the window length of the running mean of the linear regression slope estimate and the running mean of the correlation between fitted and observed values.

The parameters are:

- daily closing values since 2008 for the S&P 500
- Linear Regression done on periods of 16 data points
- If running mean of the last n slope coefficients is greater than 0, and running mean of correlation coefficients of fitted vs observed values is greater than 0.25, go long on the index
- Otherwise, sit aside and wait

Vary n, and see how total return behaves (now that I think of it, I should want to change the size of the data points per linear regression). Results are below, and we have a peak at n = 5 days. The information captured by the linear model seems to be lost as n grows, since we don’t have better return than the index itself at large values of n.

It shouldn’t hard to change the different time period to test the same idea on various historical periods. Alternatively, one could parametrize how far in the past one wants to test this. Here’s the code, in R as usual. I will come back to this with improvements when I get bored.

```
require(ggplot2)
require(PerformanceAnalytics)
require(quantmod)
getSymbols("^GSPC",from="2008-01-01",to=Sys.Date())
#GSPC <- to.weekly(GSPC)[,4]
GSPC <- GSPC[,4]
width = 16
for (i in (width+1):NROW(GSPC)) {
linmod <- lm(GSPC[((i-width):i),1]~index(GSPC[((i-width):i)]))
ifelse(i==width+1,signal <- coredata(linmod$residuals[length(linmod$residuals)]),
signal <- rbind(signal,coredata(linmod$residuals[length(linmod$residuals)])))
ifelse(i==width+1,signal2 <- coredata(linmod$coefficients[2]),
signal2 <- rbind(signal2,coredata(linmod$coefficients[2])))
ifelse(i==width+1,signal3 <- cor(linmod$fitted.values,GSPC[((i-width):i),1]),
signal3 <- rbind(signal3,cor(linmod$fitted.values,GSPC[((i-width):i),1])))
}
signal <- as.xts(signal,order.by=index(GSPC[(width+1):NROW(GSPC)]))
signal2 <- as.xts(signal2,order.by=index(GSPC[(width+1):NROW(GSPC)]))
signal3 <- as.xts(signal3,order.by=index(GSPC[(width+1):NROW(GSPC)]))
price_ret_signal <- merge(GSPC,lag(signal,k=1),
lag(signal2,k=1),lag(signal3,k=1),
ROC(GSPC,type="discrete",n=1))
price_ret_signal[,2] <- price_ret_signal[,2]/price_ret_signal[,1]
price_ret_signal[,3] <- price_ret_signal[,3]/price_ret_signal[,1]
getTotalReturn <- function(i) {
p <- ifelse(runMean(price_ret_signal[,3],n=i) > 0 & runMean(price_ret_signal[,4],n=i) > 0.25, 1, 0)
p <- 1+(na.omit(p * price_ret_signal[,5]))
Reduce("*", coredata(p))
}
qplot(1:30, unlist(lapply(1:30, getTotalReturn)), geom="line", xlab="x", ylab="total return")
```

The post Varying Window Length for Linear Models on Stocks appeared first on DataPunks.

]]>The post A thought on Linear Models on Stocks appeared first on DataPunks.

]]>The idea follows from the steps below:

- Get the weekly closing values of the S&P 500.
- Choose a time window (i.e. 25 weeks) and for each window, linearly regress the subset of closing values
- Choose an investment strategy based on the residuals, the running average of slope coefficients, or the running average of correlation with data points

The idea is quite simple, and so far, from Timely Portfolio’s post, it looks like the drawdown is behaving nicely.

It seems like the idea could be extended to a non-linear method. The residuals are getting larger and larger, and this indicates that linear methods are less reliable as time goes by.

```
# code from Timely Portfolio
# http://timelyportfolio.blogspot.ca/2011/08/unrequited-lm-love.html
require(PerformanceAnalytics)
require(quantmod)
getSymbols("^GSPC",from="1896-01-01",to=Sys.Date())
GSPC <- to.weekly(GSPC)[,4]
width = 25
for (i in (width+1):NROW(GSPC)) {
linmod <- lm(GSPC[((i-width):i),1]~index(GSPC[((i-width):i)]))
ifelse(i==width+1,signal <- coredata(linmod$residuals[length(linmod$residuals)]),
signal <- rbind(signal,coredata(linmod$residuals[length(linmod$residuals)])))
}
signal <- as.xts(signal,order.by=index(GSPC[(width+1):NROW(GSPC)]))
plot(signal, main="Residuals through time")
plot(log(signal), main="Log of Residuals through time")
```

The post A thought on Linear Models on Stocks appeared first on DataPunks.

]]>The post A look at market returns by month appeared first on DataPunks.

]]>I’ve already discussed the two seasonal investment scenarios (*Nov. to Apr* VS *May to Oct*) in this post, and was wondering if one could break it down further into a monthly analysis.

The quick R snippet below, edited in an informal way like this blog likes it, shows us a little boxplot (yes, I’m a sucker for those) of market returns per month. This suggests that the months of February and September are not witnessing great returns of the S&P 500.

```
require(quantmod)
require(PerformanceAnalytics)
getSymbols('^GSPC', from='1900-01-01')
m <- lapply(d, getMonth)
t <- monthlyReturn(GSPC)
d <- index(t)
getMonth <- function(i) {
as.numeric(format(i, format="%m"))
}
t$month <- lapply(d, getMonth)
boxplot(monthly.returns ~ month, t)
title('Market returns by month')
abline(0,0)
```

Photograph used with permission from mylittleshoebox.ca

The post A look at market returns by month appeared first on DataPunks.

]]>The post First attempt at Chess Data Mining appeared first on DataPunks.

]]>With the help of websites like chessgames.com, one could download saved games in PGN format (portable game notation), analyze player behavior towards certain game strategies and positions, and break them down by player strength and so on.

Well, that was my initial intention, although my attention these days migrated towards other interesting hobbies and whatnot. I’m posting my code on this blog for posterity, and maybe I’ll come back later when the interest comes back.

The intended usage was:

```
g <- ChessGame('test.pgn')
g$event # returns metadata information stored in PGN file
move.to(g, 10) # advance game g to move 10
material(g, "white") # returns a score for white
mobility(g, "relative") # returns a score of mobility white vs. black pieces
```

Here it is. Although the following is probably buggy, it could be useful to the chess hobbyists online.

```
library(R.oo)
PIECE_VALUES <- list(Q=9, R=5, B=3, N=3, P=1)
PIECES <- list("K", "Q", "R", "B", "N", "P")
ChessPiece <- function(color=NULL, piece=NULL, r=NULL, f=NULL) {
o <- list(color=color, piece=piece, r=r, f=f)
class(o) <- "ChessPiece"
return(o)
}
ChessBoard <- function() {
o <- list(ChessPiece())
length(o) <- 64
dim(o) <- c(8,8)
class(o) <- "ChessBoard"
rownames(o) <- c("a", "b", "c", "d", "e", "f", "g", "h")
o <- reset(o)
return(o)
}
summary.ChessPiece <- function(o) {
paste(o$color, o$piece, sep='')
}
getPieceInfo <- function(x) {
if (is.null(x)) " " else summary(x)
}
print.ChessPiece <- function(o) {
print(paste(o$color, o$piece))
}
print.ChessBoard <- function(o) {
print.table(array(lapply(o[,8], getPieceInfo)))
print.table(array(lapply(o[,7], getPieceInfo)))
print.table(array(lapply(o[,6], getPieceInfo)))
print.table(array(lapply(o[,5], getPieceInfo)))
print.table(array(lapply(o[,4], getPieceInfo)))
print.table(array(lapply(o[,3], getPieceInfo)))
print.table(array(lapply(o[,2], getPieceInfo)))
print.table(array(lapply(o[,1], getPieceInfo)))
}
reset <- function(o) {
UseMethod("reset", o)
}
reset.ChessBoard <- function(o) {
o[["a", 1]] <- ChessPiece(color="W", piece="R", r=1, f=1)
o[["b", 1]] <- ChessPiece(color="W", piece="N", r=1, f=2)
o[["c", 1]] <- ChessPiece(color="W", piece="B", r=1, f=3)
o[["d", 1]] <- ChessPiece(color="W", piece="Q", r=1, f=4)
o[["e", 1]] <- ChessPiece(color="W", piece="K", r=1, f=5)
o[["f", 1]] <- ChessPiece(color="W", piece="B", r=1, f=6)
o[["g", 1]] <- ChessPiece(color="W", piece="N", r=1, f=7)
o[["h", 1]] <- ChessPiece(color="W", piece="R", r=1, f=8)
o[["a", 2]] <- ChessPiece(color="W", piece="P", r=2, f=1)
o[["b", 2]] <- ChessPiece(color="W", piece="P", r=2, f=2)
o[["c", 2]] <- ChessPiece(color="W", piece="P", r=2, f=3)
o[["d", 2]] <- ChessPiece(color="W", piece="P", r=2, f=4)
o[["e", 2]] <- ChessPiece(color="W", piece="P", r=2, f=5)
o[["f", 2]] <- ChessPiece(color="W", piece="P", r=2, f=6)
o[["g", 2]] <- ChessPiece(color="W", piece="P", r=2, f=7)
o[["h", 2]] <- ChessPiece(color="W", piece="P", r=2, f=8)
o[["a", 8]] <- ChessPiece(color="B", piece="R", r=8, f=1)
o[["b", 8]] <- ChessPiece(color="B", piece="N", r=8, f=2)
o[["c", 8]] <- ChessPiece(color="B", piece="B", r=8, f=3)
o[["d", 8]] <- ChessPiece(color="B", piece="Q", r=8, f=4)
o[["e", 8]] <- ChessPiece(color="B", piece="K", r=8, f=5)
o[["f", 8]] <- ChessPiece(color="B", piece="B", r=8, f=6)
o[["g", 8]] <- ChessPiece(color="B", piece="N", r=8, f=7)
o[["h", 8]] <- ChessPiece(color="B", piece="R", r=8, f=8)
o[["a", 7]] <- ChessPiece(color="B", piece="P", r=7, f=1)
o[["b", 7]] <- ChessPiece(color="B", piece="P", r=7, f=2)
o[["c", 7]] <- ChessPiece(color="B", piece="P", r=7, f=3)
o[["d", 7]] <- ChessPiece(color="B", piece="P", r=7, f=4)
o[["e", 7]] <- ChessPiece(color="B", piece="P", r=7, f=5)
o[["f", 7]] <- ChessPiece(color="B", piece="P", r=7, f=6)
o[["g", 7]] <- ChessPiece(color="B", piece="P", r=7, f=7)
o[["h", 7]] <- ChessPiece(color="B", piece="P", r=7, f=8)
#o$white_pieces <- c(o[, 1:2])
#o$black_pieces <- c(o[, 7:8])
return(o)
}
pieces.ChessBoard <- function(x, color="all") {
if (color == "white") {
return(x.white_pieces)
} else if (color == "black") {
return(x.black_pieces)
} else {
return(c(x.white_pieces, x.black_pieces))
}
}
is.occupied <- function(board, r, f) {
!is.null(board[[r, f]])
}
ChessGame <- function(f=NULL) {
o <- list()
class(o) <- "ChessGame"
if (!is.null(f))
loadPGN(o, f)
o$board <- ChessBoard()
o$whitePieces <- array(o$board[, 1:2])
o$blackPieces <- array(o$board[, 7:8])
o$current_position <- 0
return(o)
}
loadPGN <- function(game, f) {
pat <- "\\[(.*) \".*\"\\]"
e <- readLines(f)
meta <- e[grep(pat, e)]
a <- getMeta(meta)
moves <- e[grep(pat, e, invert=TRUE)]
moves <- moves[grep("^$", moves, invert=TRUE)]
getMoves(moves)
}
getMeta <- function(l) {
mapply(parseMetaLine, l)
}
parseMetaLine <- function(l) {
pat <- "\\[(.*) \"(.*)\"\\]"
tag <- gsub(pat, "\\1", l)
value <- gsub(pat, "\\2", l)
c(tag, value)
}
applyMove <- function(board, move, color=white) {
if (!is.element(substr(move, 1, 1), PIECES)) {
move <- paste("P", move, sep="")
}
mp <- substr(move, 1, 1)
}
moveTo <- function(game, nb, black=TRUE) {
}
mobility <- function(x, ...) UseMethod("mobility", x)
mobility.ChessBoard <- function(x, ...) {
}
mobility.ChessGame <- function(x, ...) {
}
material <- function(x, ...) UseMethod("material", x)
material.ChessPiece <- function(x, ...) {
return(PIECE_VALUES[[x.piece]])
}
material.list <- function(x, ...) {
sapply(x, material)
}
material.ChessBoard <- function(x, color="relative") {
if (color != "relative") {
return(sum(material(pieces(x, color=color))))
} else {
return(sum(material(pieces(x, color="white"))) - sum(material(pieces(x, color="black"))))
}
}
material.ChessGame <- function(x, color="relative") {
}
getMoves <- function(moves) {
moves <- gsub(';.*$', '', moves)
moves <- gsub('\\{.*?\\}', '', moves)
s <- paste(moves, collapse=" ")
moves <- strsplit(s, '[0-9]*?\\.')[[1]]
moves <- unlist(lapply(moves[-1], function(x) { a <- strsplit(trim(x), ' ')[[1]]; c(a[1], a[2]) }))
dim(moves) <- c(2, length(moves)/2)
moves <- t(moves)
colnames(moves) <- c("W", "B")
moves
}
is.free.square <- function(x) {
if (is.null(x)) return(1) else return(0)
}
is.free <- function(board, r=0, f=0, d=0) {
if (r == 0 && f == 0) return(unlist(lapply(board, is.free.square)))
if (r == 0) return(unlist(lapply(board[f,], is.free.square)))
if (f == 0) return(unlist(lapply(board[,r], is.free.square)))
if (d == 1) {
return(unlist(lapply(board[col(board) + row(board) == f+r], is.free.square)))
}
if (d == -1) {
return(unlist(lapply(board[col(board) - row(board) == f-r], is.free.square)))
}
return(is.free.square(board[[f,r]]))
}
mobility.line <- function(l, p) {
l[p] <- 2
pat <- '(^|[01]*0+)(1*21*)($|0+[01]*)'
l <- gsub(pat, '\\2', paste(l, collapse=''))
nchar(l)-1
}
mobility.ChessPiece <- function(x, board) {
if (x$piece == 'P') {
if (x$color == 'W')
return(is.free(board, x$r+1, x$f))
else
return(is.free(board, x$r-1, x$f))
} else if (x$piece == 'R') {
return(mobility.line(is.free(board, r=x$r), x$r) + mobility.line(is.free(board, f=x$f), x$f))
} else if (x$piece == 'N') {
m <- c(-2, 1, -2, -1, -1, 2, -1, -2, 1, 2, 1, -2, 2, 1, 2, -1)
dim(m) <- c(2, 8)
l <- apply(m, 2, function(y, r=x$r, f=x$f) {(y[[1]] + f - 1)*8 + y[[2]] + r})
l <- l[l>0]
l <- l[l<65]
if (x$r < 3)
l <- l[l %% 8 < 5]
if (x$r > 6)
l <- l[l %% 8 > 4]
return(sum(unlist(lapply(board[l], is.free.square))))
} else if (x$piece == 'B') {
return(mobility.line(is.free(board, r=x$r, f=x$f, d=1)) + mobility.line(is.free(board, r=x$r, f=x$f, d=-1)))
} else if (x$piece == 'Q') {
return(mobility.line(is.free(board, r=x$r), x$r) + mobility.line(is.free(board, f=x$f), x$f) + mobility.line(is.free(board, r=x$r, f=x$f, d=1)) + mobility.line(is.
free(board, r=x$r, f=x$f, d=-1)))
} else if (x$piece == 'K') {
m <- c(1, 1, 1, 0, 1, -1, 0, -1, -1, -1, -1, 0, -1, 1, 0, 1)
dim(m) <- c(2, 8)
l <- apply(m, 2, function(y, r=x$r, f=x$f) {(y[[1]] + f - 1)*8 + y[[2]] + r})
l <- l[l>0]
l <- l[l<65]
if (x$r < 2)
l <- l[l %% 8 < 3]
if (x$r > 7)
l <- l[l %% 8 > 6]
return(sum(unlist(lapply(board[l], is.free.square))))
}
}
```

Photograph used with permission from mylittleshoebox.ca

The post First attempt at Chess Data Mining appeared first on DataPunks.

]]>The post Lending Club – naive data analysis appeared first on DataPunks.

]]>Lending Club is an online financial community that brings together creditworthy borrowers and savvy investors so that both can benefit financially. We replace the high cost and complexity of bank lending with a faster, smarter way to borrow and invest.

Lending Club’s returns are very attractive (for the lenders’ point of view), and, at the same time, the Club allows borrowers to avoid high interest rates for similar loans from banks. There are obviously some risk associated with the high returns (like the costly money recovery from a payment default, etc.), and one can ask whether the risks are well weighted with each of the loans.

A few obvious things to note from the cute box charts:

Nothing really surprising so far. Some data on defaults is also available, and one could continue digging into the provided data to see if any pattern emerges.

```
loans <- read.csv("data/lclub.csv", header=TRUE, skip = 1)
o <- data.frame(id=loans$Loan.ID)
o$rul <- floor(as.numeric(gsub("%", "", loans$Revolving.Line.Utilization))/10)
o$rate <- as.numeric(gsub("%", "", loans$Interest.Rate))
o$grade <- loans$CREDIT.Grade
boxplot(rate ~ grade, data=o)
boxplot(rate ~ rul, data=o)
boxplot(rul ~ grade, data=o)
```

The post Lending Club – naive data analysis appeared first on DataPunks.

]]>The post Netflix Post-mortem – How to detect Bubbles appeared first on DataPunks.

]]>I came across Minsky’s explanation of bubbles (vulgarized), hidden in a comment from this post from The Big Picture, and the part about market invasion by outsiders stuck to my head (see right quote).

Increasing prices are not enough for a bubble. Every financial crisis needs rocket fuel and there is only one thing that this rocket burns – cheap credit. Without it, there can be no speculation. Without it, the consequences of the displacement peter out and the sector returns to normal.When a bubble starts, the market is invaded by outsiders. Without cheap credit, the outsiders can’t join in.

**Question**: How do we detect if outsiders has invaded the market (from a measurement point of view)?

Obviously, this depends on the context and the way transactions are performed. Let’s look at two recent bubbles: the real estate bubble, and the most recent Netflix bubble.

For the first case, the real estate case, we’ll be looking at “Loans Secured by Real Estate“, available by the Federal Reserve Bank of St. Louis. By visual inspection, it looks like it behaved exponentially after 1995. One can convinced himself of this with the following half-baked code:

```
o <- read.csv('data/loans.csv')
o <- ts(cumsum(o[,2]), start=1985, frequency=4)
plot(log(o))
```

It looks like the cumulated amount of secured loans started exponentially a bit before 1995 up to just before 2010. Again, nothing is really precise in this post. One could use nonlinear fitting to see where things really started and went bust. If it really did expand exponentially, the bubble wasn't anywhere near sustainable. Note that this doesn't argue whether we can have a "soft-landing" or a bubble bust.

For the second case, the Netflix 2011 bubble, let's look at the cumulated volume in NFLX, and we can see the linear growth changed drastically in April 2010, a period we can attribute to the inclusion of outsiders (due to media over-coverage or some other factor).

```
library(quantmod)
getSymbols('NFLX')
plot((cumsum(Vo(NFLX))))
```

The second increased in linear growth might be attributed to the recent dump of the stock. To continue this analysis, one could use R for piecewise linear fitting.

The post Netflix Post-mortem – How to detect Bubbles appeared first on DataPunks.

]]>The post Two seasonal investors – R snippet appeared first on DataPunks.

]]>Going back 50 years in the past, the first partition of the year gives more than 40 times the initial investment, whereas the other partition is barely profitable. This is the idea that brought on seasonal investing, with those who are money-savvy and who like to visit sites like lovemoney.com for savings tips cottoning on to what was going on. The market can reach important lows and highs at certain times of the year and it is very valuable if you can keep track of these and work from them. Those who can capitalise on such statistics will find that they will go far in the investment world.

Below is some R code to verify this, and one could take this to research other highly asymmetries in seasonality.

```
require(quantmod)
require(PerformanceAnalytics)
getSymbols('^GSPC', from='1900-01-01')
d <- index(GSPC)
getMonth <- function(i) {
as.numeric(format(i, format="%m"))
}
firstHalf <- function(i) {
ifelse(getMonth(i) <= 4 || getMonth(i) >= 11, 1, 0)
}
hold1 <- unlist(lapply(d, firstHalf))
hold2 <- 1-hold1
hold1 <- xts(hold1, d)
hold2 <- xts(hold2, d)
perf <- merge(hold1 * ROC(GSPC[,4], type='discrete', n=1), hold2 * ROC(GSPC[,4], type='discrete', n=1))
colnames(perf) <- c("Nov - Apr", "May - Oct")
charts.PerformanceSummary(perf)
```

The post Two seasonal investors – R snippet appeared first on DataPunks.

]]>The post Volume by Price charts with R – first attempt appeared first on DataPunks.

]]>Such charts can be useful to determine support and resistance levels, as they illustrate amount of volume for different price ranges.

Below is my first attempt at this. Note that

1. the left axis (price) is not perfectly aligned and doesn’t show up properly

2. the horizontal barplot could be upgraded to a stacked barplot identifying which part of the volume came from a up/down price movement.

```
library(quantmod)
sumVol <- function(x) {
sum(Vo(subset(GSPC, GSPC$t == x)))
}
getSymbols("^GSPC")
hi <- hist(Cl(GSPC), plot=F)
b <- hi$breaks
GSPC$t <- floor(Cl(GSPC)/ 100) * 100
vols <- unlist(lapply(b, sumVol))
t <- as.table(vols)
names(t) <- b
plot(GSPC, col='red')
par(new=T)
barplot(t, horiz=T, axes=F, col=rgb(0.1,0.7,0.1,alpha=0.1))
```

The post Volume by Price charts with R – first attempt appeared first on DataPunks.

]]>The post Studying market reactions after consecutive gains (losses) appeared first on DataPunks.

]]>Question: What happens to the market after runs of positive or negative returns? Will the market tank or soar after n days of gains/losses?

First, a little dissection of historical data (S&P 500 since 1980).

```
library(quantmod)
getSymbols("^GSPC", from='1980-01-01')
y <- diff(Cl(GSPC))
# looking at direction only
y[y < 0] <- -1
y[y > 0] <- 1
r <- rle(as.vector(y))
# losses
l <- r$lengths[which(r$values < 0)]
# gains
g <- r$lengths[which(r$values > 0)]
plot(as.factor(l), col='red'); title('Consecutive losing days for the S&P 500 since 1980')
plot(as.factor(g), col='green'); title('Consecutive gains for the S&P 500 since 1980')
```

So the record of consecutive gains is 12 days in a row, and we have a record of 9 losses in a row for the S&P 500.

```
# given a run index from the r table, get the day associated with the start of that particular run
getDayIndex <- function(i) {
sum(r$lengths[1:i-1])
}
# getLossRuns(9) will give you the dates (index) of the start of 9 consecutive losing streak
getLossRuns <- function(i) {
n <- length(r$lengths)
l <- intersect(which(r$lengths == i), which(r$values < 0))
l <- l[l < n - i - 30]
unlist(lapply(l, getDayIndex))
}
getGainRuns <- function(i) {
n <- length(r$lengths)
l <- intersect(which(r$lengths == i), which(r$values > 0))
l <- l[l < n - i - 30]
unlist(lapply(l, getDayIndex))
}
```

Now, you can simply write:

```
> GSPC[getLossRuns(8)]
GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
1982-08-02 107.71 109.09 107.11 108.98 53460000 108.98
1991-08-28 393.06 396.64 393.05 396.64 169890000 396.64
1996-06-07 673.03 673.31 662.48 673.31 445710000 673.31
2008-09-30 1113.78 1168.03 1113.78 1166.36 4937680000 1166.36
> GSPC[getLossRuns(8) + 8 + 30]
GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
1982-09-24 123.79 123.80 123.11 123.32 54600000 123.32
1991-10-22 390.02 391.20 387.40 387.83 194160000 387.83
1996-08-01 639.95 650.66 639.49 650.02 439110000 650.02
2008-11-21 755.84 801.20 741.02 800.03 9495900000 800.03
```

to get the data on the start of 8-day losing streaks since 1980, and their data 30 days after the end of such streaks.

```
getAvgReturnsLoss <- function(i) {
d2 <- as.vector(Cl(GSPC[getLossRuns(i) + i + 30]))
d1 <- as.vector(Cl(GSPC[getLossRuns(i)]))
(d2 - d1) / d1
}
boxplot(lapply(1:9, getAvgReturnsLoss))
getAvgReturnsGain <- function(i) {
d2 <- as.vector(Cl(GSPC[getGainRuns(i) + i + 30]))
d1 <- as.vector(Cl(GSPC[getGainRuns(i)]))
(d2 - d1) / d1
}
boxplot(lapply(1:12, getAvgReturnsGain))
```

Each box in the boxplot represents the different returns after 30 days. For example, in the boxplot for returns after losing streak, the box labeled '7' represents the different returns of the S&P 500, 30 days following any run of 7 days with negative returns.

So far, this is just an exploration. One could determine whether the returns deviate significantly. One can also play with the number of days after the given run: analyze what happens to returns 15 days after such runs, etc. The amplitude of runs could also be considered in the analysis.

Photograph used with permission from mylittleshoebox.ca

The post Studying market reactions after consecutive gains (losses) appeared first on DataPunks.

]]>The post More on higher moments: rolling skewness of S&P 500 daily returns appeared first on DataPunks.

]]>Market skewness, in naive financial modeling, is some kind of measure of (as-)symmetrical distribution of (daily) returns around the average market return. A higher skewness would tend to indicate a denser distribution of higher returns, compared to lower or negative returns.

In the cited example, skewness was estimated based on even partition of years since 2008. While is this is a neat idea, it seems like a good idea to study the evolution of a rolling skewness (skewness of returns of the preceding n days).

Below is a quick piece of R code to describe the distribution / fluctuation of a 30-day rolling skewness of the S&P 500 daily returns since 1980.

Surprisingly, the skewness is rather volatile, with sudden high negative values. The distribution of rolling skewness is negatively skewed as well.

```
library(PerformanceAnalytics)
library(quantmod)
rm(list=ls())
getSymbols(c("^GSPC"), from="1980-01-01")
part <- function(i) GSPC[i:(i+30)]
part2 <- function(i) skewness(Return.calculate(Cl(part(i))))
skews <- unlist(lapply(1:(length(GSPC)/6-30), part2))
head(skews)
plot(ts(skews), col='blue')
hist(skews, breaks=50, col='cyan')
```

Photograph used with permission from mylittleshoebox.ca

The post More on higher moments: rolling skewness of S&P 500 daily returns appeared first on DataPunks.

]]>The post S&P 500 components heatmap in R appeared first on DataPunks.

]]>Financial indices, like the S&P 500 or the Dow Jones indices, are mathematically some kind of measure of overall market performance, and their calculations (usually in real-time) might not translate very well the different price movements of the underlying index components.

Combining the different features of the quantmod package (as described here), one can come up with a time-series based heatmap of some financial indices, like the S&P 500 for example.

Below is an adaptation of Hans Gilde’s ideas to the S&P 500 since 2007. Note that some symbols were removed while preparing the data. Note also how highly correlated the different components are.

```
library(XML)
library(RColorBrewer)
library(plyr)
library(quantmod)
library(Heatplus)
# get the list of symbols
l <- readHTMLTable('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[[1]]
l <- as.vector(l$Ticker)
l <- l[c(-59, -71, -80, -124, -141, -147, -275, -283, -292, -299, -309, -316, -360, -378, -381, -406, -439, -470, -471)]
getMonthlyReturns <- function(sym) {
y <- to.monthly(getSymbols(sym, auto.assign=FALSE, from='2007-01-01'))
as.vector(ClCl(y)*100)
}
d <- unlist(llply(l, getMonthlyReturns, .progress="text"))
# bounds at -10% and +10% for visual clarity
d[d < -10] <- -10
d[d > 10] <- 10
heatmap_2(t(matrix(d, ncol=481)), col=brewer.pal(9, 'PuBu'), Rowv=NA, Colv=NA, do.dendro=c(FALSE,FALSE), scale='none', legend=2, main="S&P 500 since 2007 (monthly returns)")
```

The post S&P 500 components heatmap in R appeared first on DataPunks.

]]>The post Shared and reproducible computing with OpenCPU appeared first on DataPunks.

]]>OpenCPU is a new initiative to make innovations in statistics, visualization and data-science more widely applicable.

I guess the idea of online analysis and visualization, and online cloud R computing platform isn’t really new at this point anymore, but the real incentive is the idea of reproducible research. The OpenCPU platform allows researchers to share code and data without installation of software packages, allowing transparent result publications.

For example, one can share a plotting function and call OpenCPU with http calls to get dynamic PNG, PDF and SVG files (for plots and charts), or to get the source code of the functions in ASCII, JSON, RData or rds.

It’s still in beta, but it looks really promising.

The post Shared and reproducible computing with OpenCPU appeared first on DataPunks.

]]>