Backtest 22 firm-characteristics risk factors to evaluate their explanatory power to the future stock return.
cross-sectional linear regression, Fama-MacBeth
- Wrote python functions to calculate 3 technical factors(MACD, RSI, BBP) by using daily closing prices; see the code here 3technical_factors.ipynb
- Collected firm-characteristic data(risk factors) from WRDS, Compustat, comp.funda etc for stock universe. The stock universe includes NYSE/AMEX/Nasdaq stocks, only common stocks (not preferred, etc.). The risk factors are across several different categories.
- Collected annual return for development set and monthly return for validation set from WRDS
- Since some factors(eg. ROA...) have only value at midyear, while other factors(eg. turnover rate...) have value on each day, we use the mid-year value or calculated the mean value from midpoint of last year to the midpoint of this year. see code here TO_year.ipynb
- Merged all the factors data on stock "permno" and "year". Merged factors data with the return next year
- Removed factors with too many missing values to ensure the final dataset has enough observations
- Removed stocks have a price that exceeds $5 per share and a market capitalization of equity of at least $100 million at the beginning of a given forecast year, to avoid trading illiquid stocks
- Normalized factor data using z-score by industry
4488 unique stocks of 26 years (1990.6 - 2016.6), with 34,560 observations and 27 columns(22 risk factors plus index, permno, year, return, and price) (760,000+ points) Separated data into in-sample data (1990.6-2008.6) and out-of-time data (2009.6-2016.6).
correlation = corr.test((train[,.(zEP,zIA,zIG,zIK,zLEV,zNOA,zNS,zOK,zROA,zROE,zlnSIZE,zMOM,zOS,
zSG,zSUE,zBETA,zCI,zLTR,zTO,zRSI,zMACD,BBP,RET)]))
correlation[["r"]]
which(correlation[["r"]]>=0.5, arr.ind = T)
correlation[["p"]]
which(correlation[["p"]] < 0.05, arr.ind = T)
remove features has high correlation(>=0.5) with high significance(<0.05)
Run cross-sectional regression during each estimation window(year). Estimated the factor premia on each factor during each estimation window, then compute the average factor premia(the Fama-MacBeth average) across all windows and the t-statistic(Fama-MacBeth t-statistic) of this average.
#calculate Fama-MacBeth t-statistic
coefficients_raw = train[, as.list(coef(lm(RET ~ zEP+zIA+zIG+zIK+zLEV+zNOA+zNS+zOK+zROA+zROE+zlnSIZE+zMOM+zOS+
zSG+zSUE+zBETA+zCI+zLTR+zTO+zRSI+zMACD+BBP))), by = year]
(coefmean_raw = apply(coefficients_raw[, .SD, .SDcol = - "year"],2,mean))
(coefsd_raw = apply(coefficients_raw[, .SD, .SDcol = - "year"],2,sd))
#1990-2008 the number of windows is 18, square root use 18
(tstat_raw = coefmean_raw / coefsd_raw * sqrt(18))
#pick features which has FM>1.3, do it again
coefficients = train[, as.list(coef(lm(RET ~ zLEV+zNS+zOK+zOS+zSG+
zSUE+zLTR+zRSI+zMACD+BBP))), by = year]
(coefmean = apply(coefficients[, .SD, .SDcol = - "year"],2,mean))
(coefsd = apply(coefficients[, .SD, .SDcol = - "year"],2,sd))
(tstat = coefmean / coefsd * sqrt(18))
Keep factors with absolute FM t-statistic above 1.3 and matched expected sign. Those factors with blue highlight are selected.
- Step1: score stocks during the out-of-sample period. Weighted each stock’s factor exposure z-score by the t-statistic derived for that factor in the regression model. Score = z-score(1) x t-stat(1) + z-score(2) x t-stat(2) + …
#out of sample validation
#import full sample of monthly returns of stocks
test = setDT(read.csv("Test_zscore.csv"))
ret = setDT(read.csv('Monthly_Test.csv'))
setnames(ret, c('permno', 'date', 'ticker','PRC', 'ret'))
ret[, date := as.Date(as.character(date), format = "%Y%m%d")]
ret[, year := year(date)]
ret[, month := month(date)]
write.csv(ret,"ret.csv")
ret = setDT(read.csv("ret.csv"))
ret = na.omit(ret)
ret[, ret := as.numeric(as.character(ret))]
setnames(ret,c('x','PERMNO', 'date', 'ticker','PRC', 'ret','year','month'))
#calculate score for each stock
test[, score := zLEV * tstat[2] + zNS * tstat[3] + zOK * tstat[4]+zOS * tstat[5]+zSG*tstat[6] + zSUE * tstat[7]+ zLTR * tstat[8] + zRSI*tstat[9] + zMACD * tstat[10]+BBP*tstat[11]]
test = na.omit(test)
- Step2: Ranked stocks by the score and then divided into 10 quantile. Developed a zero investment portfolio by long stocks in the first quantile and short stocks in the last quantile.
#define long and short portfolio
test[, group := findInterval(score, quantile(score, c(0.1, 0.9))), by = year]
long = test[group == 2]
short = test[group == 0]
for(i in c(2009:2016)){
longRet = ret[PERMNO %in% long[year == i]$PERMNO & year == i + 1]
longRet = longRet[, .(longRet = mean(ret, na.rm = TRUE)), by = .(year, month)]
shortRet = ret[PERMNO %in% short[year == i]$PERMNO & year == i + 1]
shortRet = shortRet[, .(shortRet = mean(ret, na.rm = TRUE)), by = .(year, month)]
output = merge(longRet, shortRet, by = c('year', 'month'))
if (i == 2009) LSport = output
else LSport = rbind(LSport, output)
}
LSport[, LSret := longRet - shortRet]
##performance assessment
FFdata = setDT(read.csv("FF.csv"))
FFdata[, date := as.Date(as.character(dateff), format = "%Y%m%d")]
FFdata[, year := year(date)]
FFdata[, month := month(date)]
LSport = merge(LSport, FFdata, by = c('year', 'month'))
LSport[, y := LSret]
#simple mean
apply(LSport[, .(longRet, shortRet, LSret)], 2, mean)
#geometric mean
LSport[, lapply(.(longRet, shortRet, LSret), function(x) prod(1 + x)^(1/.N)-1)]
#annualised return
LSport$grossret <- LSport$LSret + 1
apply(LSport[, .(grossret)], 2, prod)
annual_ret = LSport[, .(ret = prod(1+LSret) -1), by = year]
#Sharpe ratio
(sr = LSport[, mean(LSret)/sd(LSret)])
# annualized SR
(sr = sqrt(12)*sr)
# CAPM
CAPM = lm(LSret ~ mktrf, LSport)
summary(CAPM)
#FF 3 factor
FF3 = lm(LSret ~ mktrf + smb+ hml, LSport)
summary(FF3)
#corhart 4 factor
C4 = lm(LSret~mktrf + smb + hml + umd, LSport)
summary(C4)
#Information rate
(ir = coef(C4)[1]/sd(C4$residuals))
#annualized IR
(ir = sqrt(12)* ir)
The annualized mean return of S&P500 was 11.92% and the average borrowing interest rate was around 3.5% during that period, which means our portfolio strategy bellow the market return a little during those years. However, considering market risk, interest rate risk that are hedged by the long-short strategy, our portfolio strategy might beat the market.
The alpha of CAPM, FF-3-factors model, and Corhart-4-factors model are pretty low, which means the return of our strategy can be explained mostly by market premium, size premium, value premium, and momentum.
see slides here Backtesting_presentation.pdf