A unique way to measure stock market correlation
- Motivation
- Features
- Prerequisites
- Examples
- Comparison to simple correlations
- Future Work
This project came out of a desire to characterize the US stock market movements and those days when all stocks moved in the same direction, in the same way; all stocks are essentially “in-sync”. Obviously such days will have a large effect on any trading strategy as a stock’s price movement is dictated by the overall market movement; you may be swimming against the tide. The idea of Pack Correlation may be a suitable metric to gauge this market behaviour.
PackCorrelation calculates the correlation of assets in a basket to a target asset (alpha) and creates a dataframe of the daily average correlation for the basket, the max (beta), median (epsilon) and least correlated assets (sigma) plus the most anticorrlated asset (omega). These quantities define the "pack".
Mathematically the average pack correlation is:
where α is the stock to which others are compared, is the ticker in the pack and N is the size of the pack.
So therefore the other members of the pack can be defined as follows:
- Calculate an average correlation for group of stocks to a predefined lead/alpha stock for each day under consideration
- Helps characterize a stock’s price movement relative to the alpha e.g. highly correlated, uncorrelated or anticorrelated
- Visualize the distribution of correlations of stocks to the alpha for a given day or the whole date range
- Python
- Pandas
- Numpy
- Matplotlib
- Seaborn
- yfinance
PackCorrelation
uses a somewhat idiosyncratic data structure that makes it easy and quick to calculate correlations for each day for a stock to the alpha. The format uses data
as a dictionary of dataframes with the stock tickers as keys. The ticker_dates
dictionary is essentially a list of indices that point to locations in a ticker’s dataframe that correspond to the open and closing rows for each day. This allows for rapid lookups of individual price data or slicing of days in the case of Pack Correlation. The speed improvement comes at a cost of only the minor space requirement of ticker_dates
. More details can be found in the complementary repository here. The data
& ticker_dates
data structure can be quickly implemented given csv files for each stock. The FinDataExtract
class contains the pop_data_dict
method which will generate the relevant dictionary and pop_ticker_dates
will produce the ticker_dates
structure.
from findata_extraction import FinDataExtract
# Create instance of class
>>> fde = FinDataExtract()
# Define path at which to download data and/or from which to generate data and ticker_dates dictionaries
>>> fde.set_file_path(".//Watchlist//Test")
# create a list of tickers from a given csv file e.g. members of S&P 500 index
>>> fde.pop_watchlist(watchlist_path=".//Watchlist//sp500.csv")
# Download 4 weeks of 1m intraday data for each ticker from Yahoo Finance using the yfinance library
>>> fde.download_ticker_data()
# Create a dictionary of dataframes as values and the stock ticker as corresponding key
>>> fde.pop_data_dict()
>>> data = fde.data
# Create a dictionary of lists of dates and open and close indices as values and stock ticker as corresponding key
>>> fde.pop_ticker_dates()
>>> ticker_dates = fde.ticker_dates
# Now we can easily access specific days of data for each ticker
# Access the opening info for most recent day for SPY
>>> open_index = ticker_dates["SPY"][-1][3]
>>> data["SPY"].loc[open_index]
Datetime 2022-04-01 09:30:00
Open 453.309998
High 453.339996
Low 452.799988
Close 453.070007
Adj Close 453.070007
Volume 2290723.0
Name: 188977, dtype: object
# and the closing info for the most recent day
>>> close_index = ticker_dates["SPY"][-1][4]
>>> data["SPY"].loc[close_index]
Datetime 2022-04-01 15:59:00
Open 453.140015
High 453.170013
Low 452.779999
Close 452.890015
Adj Close 452.890015
Volume 3033426.0
Name: 189366, dtype: object
Now that we have the correct data structures we can easily calculate the pack correlation for each day under consideration.
# Create instance of the PackCorrelation class passing previously defined data and ticker_dates
>>> pack = PackCorrelation(data, ticker_dates)
# Define alpha against which correlations for all other members of pack will be calculated
>>> pack.define_alpha("SPY")
>>> print(pack)
Contains a dictionary of 560 dataframes with 252 days and alpha as SPY
# Run correlation calculation and plot result
>>> pack.find_pack_correlation(plot_av=True)
Here we can see the overall pack correlation as a function of time. The find_pack_correlation
method will create a dataframe pack.corr_date
containing the average, median and standard deviation of the entire pack correlation, the directional pack correlation (average pack correlation modified by alpha gain or loss) plus the names and correlations of the pack members (Beta, Epsilon, Sigma, Omega) for each day under consideration.
We can also easily pull out a single day and look at it in more depth.
>>> pack.corr_date.iloc[10]
Day [4, 22, 2021]
Av Corr 0.626548
Dir Corr -0.626548
Median Corr 0.749294
Stdev Corr 0.335698
Alpha Gain 0.991702
Beta NXPI
Beta Corr 0.979383
Epsilon TGT
Epsilon Corr 0.749294
Sigma CVS
Sigma Corr -0.000276
Omega UVXY
Omega Corr -0.976964
Name: 10, dtype: object
# can plot each of the pack members for a given date with prices normalized so as to be easily #compared
>>> pack.plot_day_corr(date="2022-02-25", plot_alpha=True, plot_beta=True, plot_omega=True)
Selected day: [2, 25, 2022]
Average day correlation: 0.77
Alpha: SPY
Beta: DHR (0.98)
Omega: UVXY (-0.96)
From this plot it’s clear to see that the Beta is following the Alpha closely and the Omega is anti-correlated with the two as is expected by definition.
# plot a histogram to show the distribution of correlations for a given day
>>> pack.plot_hist_corr("2022-04-01", bins=50)
Selected day: (2022, 4, 1)
Mean: 0.41
Median: 0.47
Mode: 0.56
# plot high and low correlated days together
>>> pack.plot_hist_corr("2022-02-25", bins=50, alpha=0.5)
>>> pack.plot_hist_corr("2021-11-02", bins=50, alpha=0.5)
Selected day: (2022, 2, 25)
Mean: 0.77
Median: 0.88
Mode: 0.91
Selected day: (2021, 11, 2)
Mean: 0.07
Median: 0.08
Mode: 0.46
# create and plot a heatmap to show all correlation distributions as a function of time
>>> pack.plot_heatmap(bins=100)
# create a new dataframe for a ticker for the given date range
>>> new_data = pack.slice_data("DHR", "2022-02-22", "2022-02-28")
Data slice for DHR
# similarly, plot the “Close” data for a ticker for a given date range
>>> pack.plot_data("TSLA", start_time="2022-03-01", end_time="2022-03-31", plot_series="Close")
Data slice for TSLA
Close-data plotted for TSLA
It is reasonable to ask why we would use this pack correlation method over simply finding correlations between all the stocks of interest. This would give a more complete picture but would also include many spurious correlation we may not be interested; we want to know how the pack relates specifically to the alpha. Furthermore a simple calculation shows us that this complete approach may take significantly longer to achieve.
Take the above example, 560 stocks for the previous year of 252 trading days. In total there will be n(n-1)/2 correlation calculations between the stocks for any given day. For large enough numbers we can approximate this to n2 leading to, in the Big Oh notation, O(n2). Whereas for the pack correlation method we only have to make 559 correlation calculation of stocks to the alpha or O(n) in time complexity. In our above example this means approximately 280 times less work. This is one reason why this pack correlation method is clearly advantageous.
- For a given date, find correlations as a function of time
- Allow selection of different types of correlation methods
- Add weighting to stock correlations in pack when computing average correlation for a given day e.g. give AAPL a bigger contribution than A. Currently all stocks have equal weight
- Allow for pack correlation calculation for different granularity of data e.g. 5min, 1 hour, 1 day candles
- Add error estimation
- Provide features for cryptocurrencies
- Add automatic ticker_dates calculator for given data
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.