- A running Python installation. Python 3.x preferred, should work on Python 2.x as well.
- saxpy by Paul Senin installed. Install it using pip:
Python 3.x
pip3 install saxpy For Python 2.x pip install saxpy
- numpy, pandas, csv python libraries. If not already installed, can be done using pip3 (or pip).
Before running the scripts, you need to make sure the data files are in the /DATA/
folder in the repository. The data should be a seequence of time-series. Refer to one of the sample files in DATA/
in the results branch.
- Run
sax_open_prices.py
to get a filesax_words_ri_norot_w=$_a=#.txt
. Enter the name of the dataset as a command line parameter to the file. - Execute
show_CM.py
to obtain a fileCM_w=$_a=#_lsh_limit=@.txt
. - Execute Hcluster_my_idea.py to obtain the clusters in
./CLusters/nested_clusters_15_average_w=$_a=#_lsh_limit=@.txt
. A dendrogram will also be obtained and saved in/Plots
directory. - From
nested_clusters_15_average_w=$_a=#_lsh_limit=@.txt
obtain the indices to plot and create a new file in the working directory named asto_plot_15_average_w=$_a=#_lsh_limit=@
. These indices will be obtained by copying the contents of the cluster file into the excel sheet (column A) and then copying the resukts in column B. - Finally generate a comparison plot by running gen_subplots.py.
NOTE: The various parameters (w,k,lsh_limit) can be varied by changing the respective values in the code files. $,# and @ have been used as placeholder in this guide for w,k and lsh_limit respectively.
TO generate a subplot comparing the results obtained from the algorithm, use the gen_subplot.py
script.
A UI based tool has also
To view trends for specific stock tickers (by viewing their time-series) use the csv_price_plot.py
script.
Clusters obtained with w=8
and k=3
named as a,b,c,...m from left to right, top to bottom:
Clusters obtained with w=4
and k=3
named as a,b,c,...m from left to right, top to bottom:
Clusters obtained with w=4
and k=6
named as a,b,c,...m from left to right, top to bottom:
Clusters obtained with w=8
and k=6
named as a,b,c,...m from left to right, top to bottom:
These clusters, on caomparing with the ones used by the Mutual Fund houses whose lists were taken here, offer around 30 percent better diversity.
- Li Wei, Eamonn Keogh and Xiaopeng Xi (2006) SAXually Explict Images: Finding Unusual Shapes. ICDM 2006.
- UCR Data-Archive. Note: The data used to run the scripts can be obtained from the UCR Data Archive. Alternatively, you can download a few data files (to test and run the scripts) from this link.
- The saxpy library by Paul Senin.