This is for the final project of FRE 7121 Statistical Arbitrage.
Instructor: Professor Daniel Totouom
[TOC]
Statistical arbitrage, in terms of which the simplest form is called pairs trading, is often assumed the residuals of co-integrated assets are mean-reverting, or more specifically, following Ornstein–Uhlenbeck process.
However, this assumption doesn't always hold true. In Algorithmic Trading of Co-integrated Assets (A. Cartea, S. Jaimungal, 2016), the authors analyze the optimal asset allocation by stochastic control techniques. The explicit closed-form solution shows to be affine in the co-integrated vector and reveals that the residuals of co-integrated assets is a Brownian motion under risk-neutral Q measure, instead of a mean-reversion process, which is under real-world P measure.
It reminds me that in P measure, it is very reasonable to assume the dynamics of co-integrated residuals are a mixture of a mean-reversion process and a random walk, or even the mean-reversion process probably consisting of a fast one and a slow one.
Thus, my project starts from this idea. To pursue this goal, by virtue of the idea of Hidden Markov Model, our study employed logistic mixture autoregressive model (LMAR) (from [3]&[4]), which is a regime-switching time-series model allowing different location and deviation mixtures in different regimes. What's more, the transition probability follows a logistic function.
This project adopts the method introduced by [4]. The structure of this project is to describe the LMAR model in section 2, and deliver parameter estimation/calibration methods in section 3. Section 4 gives implementation of trading strategy and section 5 displays some back-testing results.
The co-integrated Logistic Mixture Auto-regressive model is given as
$$
X't\alpha=\alpha_0+a_t
$$
where $X_t = (X{1,t},X_{2,t},...,X_{N,t})'$ is log prices of stocks at time
Now let
where
The absolute residuals,
Theorem 2.1: Suppose regime 1 is stationary, regime 2 is non-stationary, and
We use OLS and Augmented Dick-Fuller Test to estimate co-integrated vector
The biggest challenge in this project is to estimate parameters of LMAR. We employ Expectation-Maximization (EM) algorithm to do this tricky job.
In doing EM algorithm, we just assumes that the error of residuals has constant variance and follows Gaussian process.
But there is one problem here: the co-integrated residuals
This section attempts to solve two questions:
1). How can we build up groups of stocks with similar patterns?
2). How can we get LMAR involved into pairs trading strategy?
To answer the first question, we employ hierarchical clustering algorithm. By using this method, we don't need to pre-define the number of clusters. What we need to do is to define distance metric ($d(a,b) =1-\rho(a,b)$), define linkage criteria ($L(A,B) = \frac{1}{|A||B|} \sum_{a \in A, b \in B}d(a,b)$), and the threshold to form clusters, by which you can control the scale of clusters you want.
![cluster](C:\Users\Hao Guo\Pictures\cluster.PNG)
See Appendix for the entire clustering results.
To answer the second question, let us start with a simulation:
We obtain the clustering result and get Kroger and Walmart in a cluster. We use their log prices to run OLS, ADF test and EM algorithm iteratively, we obtain: $$ X_{1,t}-0.72X_{2,t}=a_t-0.18 $$
We draw a graph:
![KR_WMT](C:\Users\Hao Guo\Documents\Python Scripts\FRE_StatArb_project\KR_WMT.png)
We find that the two regimes, or two states are too similar and too close to actual residuals, so we cannot take advantage of the error between actual residuals and two states to implement arbitrage.
But we also find that there is a highly interactive relationship between actual residual and transition probability, particularly when the residual goes very high or very low, and then there is almost surely a mean reversion. This is consistent with Theorem 2.1 mentioned in section 2.
Thus our trading strategy is as follows:
1). Compute
2). If
3). If
4). Any time if holding a position and
We use S&P 500 stocks 1000 days (since 01/01/2011) data to estimate and calibrate parameters of stocks in each cluster, and use the rest 385 days data for back-testing. Finally we sum up all clusters' PnL.
In order to measure if our new trading strategy is effective or not, we set up a benchmark: naive strategy, which is the most classical strategy using historical mean as
Because the parameter calibration for all stocks is very time-consuming, we just compute the PnL for threshold
![PnL_k1](C:\Users\Hao Guo\Documents\Python Scripts\FRE_StatArb_project\PnL_k1.png)
![PnL_k2](C:\Users\Hao Guo\Documents\Python Scripts\FRE_StatArb_project\PnL_k2.png)
We can see that new strategy enjoys significantly smaller drawdowns than naive strategy. So the
We also compare the four key performance measurements between our new strategy and naive strategy.
When k=1 | New Strategy | Naive Strategy |
---|---|---|
Hit Ratio | 57.66% | 52.72% |
Expected Gain | 0.0271Dollar | 0.0771 Dollar |
Expected Loss | -0.0222 Dollar | -0.0862 Dollar |
Trading Frequency | 62 times / 385 days | 294 times / 385 days |
When k=2 | New Strategy | Naive Strategy |
---|---|---|
Hit Ratio | 78.70% | 51.95% |
Expected Gain | 0.0076 Dollars | 0.0803 Dollars |
Expected Loss | -0.0145 Dollars | -0.0789 Dollars |
Trading Frequency | 24 times / 385 days | 313 times / 385 days |
We can see that from k=1 to k=2, the hit ratio of our new strategy largely increased, but at the cost of decreasing expected gain and trading frequency. It is acknowledged that a higher hit ratio with a much lower trading frequency makes the high ratio not that compelling. Besides, the PnL at k=1 of our new strategy is much higher than that at k=2, so we can say that k=1 is closer to the optimal threshold than the case of k=2.
What's more, the new strategy has an obvious advantage over naive strategy if transaction cost is taken into account.
In this project, we apply a co-integrated LMAR model to pairs trading. The goal of this is to model a mixture of mean-reversion and random walk. The EM algorithm is employed to calibrate parameters of LMAR model. Although the two states generated are not that clear, the transition probability is used as a very effective tool to detect mean-reversion process.
The empirical results suggests that compared to the benchmark, our new strategy filters out some trades that are not at the state of mean-reversion and have potential loss, therefore performs better.
This project is based on a strong assumption that if the residuals are Markov Chain, then the Theorem 2.1 hold true. But whether this assumption always hold true, we still need further study.
[1] Cartea, A. and Jaimungal, S., Algorithmic Trading of Co-integrated Assets, SSRN, 2016
[2] Bec, F., Rahbek, A. and Shephard, N., The ACR model: A multivariate dynamic mixture autoregression. Oxford Bull. Econ. Statist., 2008, 70(5), 583–618.
[3] Wong, C.S. and Li, W.K., On a logistic mixture autoregressive model. Biometrika, 2001a, 88(3), 833–846.
[4] Cheng, X.X, Yu, Philip L.H. and Li, W.K., Basket trading under co-integration with the logistic mixture autoregressive model, Quantitative Finance, 2011, 11(9) 1407-1419
['CHK', 'RRC', 'SWN'] ['ESV', 'NE'] ['DO', 'RIG'] ['DNR'] ['CNX'] ['CF', 'FCX', 'GNW', 'MOS'] ['COG', 'EQT'] ['KMI', 'OKE', 'WMB'] ['NRG'] ['AES', 'SE'] ['OXY', 'SLB'] ['CVX', 'XOM'] ['APC', 'BHI', 'EOG', 'FTI', 'HAL', 'HES', 'HP', 'NBL', 'NOV', 'PXD'] ['APA', 'COP', 'DVN', 'MRO', 'MUR', 'NBR'] ['NFX', 'QEP', 'XEC'] ['CB', 'TRV'] ['ALL', 'AON', 'PGR', 'XL'] ['EFX', 'FIS', 'MA'] ['SYK', 'TMO', 'XRAY'] ['PKI', 'WAT'] ['ACN', 'CA', 'ORCL'] ['HST', 'IP', 'WY'] ['APD', 'ECL', 'PPG', 'PX'] ['A', 'APH', 'TEL'] ['CBG', 'HRS', 'IR', 'SWK', 'TXT'] ['BA', 'COL', 'GD'] ['CTAS', 'FDX', 'UPS'] ['CBS', 'DIS'] ['GPC', 'LEG', 'OMC'] ['HOT', 'MAR', 'MCO', 'WYN'] ['KSU', 'NSC'] ['CSX', 'R', 'UNP'] ['DOW', 'EMN', 'FMC', 'LYB'] ['FLR', 'JEC'] ['CAT', 'DOV', 'FLS'] ['NUE'] ['CMI', 'PNR'] ['DD', 'OI'] ['AXP', 'COF', 'DFS'] ['GLW', 'JCI', 'XRX'] ['BWA', 'F'] ['GM', 'GT'] ['HAR'] ['RHI'] ['URI'] ['BRK-B', 'CINF', 'MMC'] ['ADP', 'FISV', 'PAYX'] ['AIG', 'HIG'] ['AFL', 'TMK'] ['LNC', 'MET', 'PRU', 'UNM'] ['GS', 'MS'] ['C', 'JPM'] ['USB', 'WFC'] ['BK', 'NTRS'] ['BBT', 'PNC'] ['STT'] ['ETFC', 'SCHW'] ['CMA', 'ZION'] ['FITB', 'KEY', 'RF', 'STI'] ['HBAN', 'MTB'] ['BAC'] ['PBCT'] ['BEN', 'LM'] ['BLK', 'TROW'] ['AMG', 'AMP', 'IVZ', 'PFG'] ['L'] ['DHR', 'GE', 'MMM', 'UTX'] ['HON', 'ITW'] ['AME', 'ROP', 'SNA'] ['EMR', 'ETN', 'LUK', 'PCAR', 'PH', 'ROK'] ['DHI', 'LEN', 'PHM'] ['MAS', 'MHK'] ['HD'] ['BAX', 'BDX', 'MDT', 'VAR'] ['AIZ', 'NDAQ'] ['BMS', 'SEE'] ['ADBE', 'INTU', 'TSS', 'V'] ['CMCSA', 'FOXA'] ['IPG', 'NWL'] ['AVY', 'BLL', 'IFF'] ['ADM'] ['LMT', 'RTN'] ['LLL', 'NOC'] ['BF-B', 'CCE', 'JNJ', 'RSG', 'WM'] ['MPC', 'PWR', 'TSO', 'VLO'] ['CCL', 'CME', 'ICE'] ['CERN', 'GOOG'] ['AVGO', 'NVDA', 'QCOM', 'SYMC'] ['ADS', 'EBAY'] ['CRM', 'PCLN', 'RHT'] ['ARG', 'SHW', 'WHR'] ['CSCO', 'IBM', 'INTC', 'MSFT'] ['CTSH', 'DNB', 'MSI'] ['EL'] ['ADI', 'TXN'] ['LLTC', 'MCHP', 'XLNX'] ['AMAT', 'KLAC', 'LRCX'] ['ADSK', 'CTXS'] ['FLIR', 'JBL'] ['DISCA', 'SNI', 'TWX', 'VIAB'] ['AN', 'HOG', 'KMX', 'TIF'] ['MLM', 'VMC'] ['DE', 'MON'] ['EXPD', 'FAST', 'GWW'] ['AET', 'CI', 'UNH'] ['DGX', 'DVA', 'LH', 'STJ', 'UHS'] ['BCR', 'BSX', 'PDCO'] ['ABT', 'CAH', 'CVS', 'MRK', 'PFE'] ['AMGN', 'CELG'] ['ABC', 'ESRX', 'MCK', 'MYL'] ['BBBY', 'JWN', 'M'] ['COH', 'PVH', 'RL'] ['LB', 'LOW', 'TJX'] ['NKE', 'SBUX', 'VFC'] ['HPQ', 'STX'] ['CSC', 'GRMN', 'WU'] ['MU', 'SNDK', 'WDC'] ['FFIV', 'JNPR', 'NTAP', 'TDC'] ['PBI', 'WYNN'] ['DAL', 'LUV'] ['AKAM', 'EA', 'VRSN', 'YHOO'] ['DRI', 'TSCO', 'YUM'] ['CHRW', 'HAS', 'MAT', 'NLSN'] ['CLX', 'DPS'] ['CPB', 'HSY', 'K'] ['GIS', 'HRL', 'MO', 'SJM'] ['CAG', 'RAI'] ['AMT', 'CCI'] ['T', 'VZ'] ['KMB', 'PG', 'PM'] ['KO', 'PEP'] ['CL', 'MKC'] ['MDLZ'] ['COST', 'MCD'] ['SYY', 'TAP'] ['GGP', 'MAC', 'SPG'] ['BXP', 'KIM', 'PLD', 'VNO'] ['HCN', 'HCP', 'VTR'] ['AIV', 'AVB', 'EQR', 'ESS', 'PSA'] ['EXC', 'FE'] ['ED', 'SO'] ['DUK', 'PCG'] ['AEE', 'CMS', 'DTE', 'NEE', 'PNW', 'SCG'] ['D', 'PEG'] ['AEP', 'WEC', 'XEL'] ['EIX', 'ETR', 'PPL'] ['CNP', 'NI', 'SRE'] ['DG', 'NFLX'] ['IRM', 'SRCL', 'STZ']