/Diversified-Stock-Portfolio-Using-Clustering-Analysis

This repository demonstrates application of unsupervised learning in the financial markets. K-Means clustering is employed to create a diversified portfolio of stocks and the resulting portfolio is backtesting against the S&P500 Index

Primary LanguageR

Creating a Diversified Stock Portfolio Using Clustering Analysis

LinkedIn

R Programming Last Commit

About

The aim of the project is to create a diversified portfolio of stocks using clustering analysis and back test its performance against the historical data of a stock index. For this we look at the S&P500 index, that is deemed to be the most accurate quantifier of the US economy. S&P500 is the comparable standard for many funds in the marketplace.

The attempt is to use K-Means clustering based on Euclidian distances to understand the effect of different parameters that affect the stock performance. The comprehension of stock performance will be aided by dividing stocks into clusters that have stocks with similar performance. These clusters provide valuable information to create stock portfolios.

Link to the dataset

https://www.kaggle.com/datasets/andrewmvd/sp-500-stocks?select=sp500_companies.csv

Exploring the dataset

drawing drawing

drawing drawing

Approach

The following features were calculated from the 10 year daily historical data for all the stocks in the S&P 500 index

  • Correlation with SP500 index value
  • Beta with SP500 index value
  • Annualized Return on equity (daily returns)
  • Annualized Volatility on equity (daily returns)
  • Sharpe Ratio
  • Daily Change in price
  • Daily Variation in price

Exploring intra feature correlation matrix using correlogram

The following plots show the correlogram plotted from the correlation matrix on the feature vectors.

drawing drawing

K-Means Clustering

The following results depict the optimal value for choosing K value using a spree plot and the clusters convex formed after choosing K =4. The stock symbols are used to represent its relative position in the cluster.

drawing drawing

Correlation Analysis

Post K- Means clustering, Cluster wise distribution of Annualized returns, Annualized Volatility, Sharpe ratio and Beta were plotted. It can be observed that there is a significant difference in at least two or more clusters both in terms of mean value and standard deviation.

drawing drawing

drawing drawing

Backtesting results KMeans Portfolio vs the S&P500 index cumulative returns

For validating the process of using clustering for creating a diversified portfolio we back tested it performance on the test/validation data. The clustering was performed on the first 7 years of data and then the remaining 3 years of data were used to validate the results of our portfolio. For this, two portfolios containing 20 stocks were created

  1. Portfolio created using top five stocks (as per Sharpe ratio) from each cluster - [RED]
  2. Portfolio created using top 20 stocks out of all 500 as per Sharpe ratio from the 7-year historical performance - [ORANGE]

aly_text

Conclusion

  • It is observed that the orange portfolio, which is a collection of stocks with the highest Sharpe ratio, outperformed the S&P500 index. The portfolio formed using k-means clustering (red line) has a better performance.
  • This indicates that the K-Means clustering successfully created a diversified portfolio in terms of all the features mentioned during clustering and not only outperformed the S&P 500 index but also a collection of stocks with best historical performance.
  • The back-testing results indicate that the k-means portfolio was correlated with the index during COVID- 19 and recovered slower than the orange index. However, as the portfolio was highly diversified the k- means portfolio had a far better long-term performance in comparison with the orange portfolio

Contact

Karthik Ram - LinkedIn