/TS_MF_cluster_analysis

Clustering mutual funds based on time-series trend analysis. Tool useful to study diversification in mutual funds and to test for trends. Using tools from numpy and pandas to obtain processable data from the NSE tickers at Yahoo Finance.

Primary LanguagePython

Mutual Funds Time Series Analysis and Clustering

Requirements

Dependencies

  1. A running Python installation. Python 3.x preferred, should work on Python 2.x as well.
  2. saxpy by Paul Senin installed. Install it using pip: Python 3.x

    pip3 install saxpy For Python 2.x pip install saxpy

  3. numpy, pandas, csv python libraries. If not already installed, can be done using pip3 (or pip).

Setting up the data

Before running the scripts, you need to make sure the data files are in the /DATA/ folder in the repository. The data should be a seequence of time-series. Refer to one of the sample files in DATA/ in the results branch.

Instructions

  1. Run sax_open_prices.py to get a file sax_words_ri_norot_w=$_a=#.txt. Enter the name of the dataset as a command line parameter to the file.
  2. Execute show_CM.py to obtain a file CM_w=$_a=#_lsh_limit=@.txt.
  3. Execute Hcluster_my_idea.py to obtain the clusters in ./CLusters/nested_clusters_15_average_w=$_a=#_lsh_limit=@.txt. A dendrogram will also be obtained and saved in /Plots directory.
  4. From nested_clusters_15_average_w=$_a=#_lsh_limit=@.txt obtain the indices to plot and create a new file in the working directory named as to_plot_15_average_w=$_a=#_lsh_limit=@. These indices will be obtained by copying the contents of the cluster file into the excel sheet (column A) and then copying the resukts in column B.
  5. Finally generate a comparison plot by running gen_subplots.py.

NOTE: The various parameters (w,k,lsh_limit) can be varied by changing the respective values in the code files. $,# and @ have been used as placeholder in this guide for w,k and lsh_limit respectively.

Plotting and getting analysis

Generate comparative subplots for algorithm results

TO generate a subplot comparing the results obtained from the algorithm, use the gen_subplot.py script. A UI based tool has also

For Specific Stocks

To view trends for specific stock tickers (by viewing their time-series) use the csv_price_plot.py script.

Results and Example Plots

Clusters obtained with w=8 and k=3 named as a,b,c,...m from left to right, top to bottom: Clusters obtained with w=8 and k=3 named as a,b,c,...m from left to right, top to bottom

Clusters obtained with w=4 and k=3 named as a,b,c,...m from left to right, top to bottom: Clusters obtained with w=4 and k=3 named as a,b,c,...m from left to right, top to bottom

Clusters obtained with w=4 and k=6 named as a,b,c,...m from left to right, top to bottom: Clusters obtained with w=4 and k=6 named as a,b,c,...m from left to right, top to bottom

Clusters obtained with w=8 and k=6 named as a,b,c,...m from left to right, top to bottom: Clusters obtained with w=8 and k=6 named as a,b,c,...m from left to right, top to bottom

These clusters, on caomparing with the ones used by the Mutual Fund houses whose lists were taken here, offer around 30 percent better diversity.

Credits and References

  1. Li Wei, Eamonn Keogh and Xiaopeng Xi (2006) SAXually Explict Images: Finding Unusual Shapes. ICDM 2006.
  2. UCR Data-Archive. Note: The data used to run the scripts can be obtained from the UCR Data Archive. Alternatively, you can download a few data files (to test and run the scripts) from this link.
  3. The saxpy library by Paul Senin.