Solver for MASA: Motif-Aware State Assignment (previously called CASC)
From the main directory import the file CASC_solver from CASC_solver import CASCSolver
Then create a solver object. The solver has the following options:
solver = CASCSolver(
window_size,
number_of_clusters,
lambda_parameter,
beta,
threshold, # convergence threshold
gamma,
input_file, # input data file
num_proc, # number of processes running in parallel as workers
maxMotifs, # cap number of motifs
motifReq, # minimum number of motifs
maxIters, # number of iterations to run (None if until convergence)
)
Then use the solver to run CASC:
(cluster_assignment, cluster_MRFs, motifs, motifRanked, bic, runtime) = solver.PerformFullCASC(
initialClusteredPoints, # the initial clustered points if you want to start with a pre-assignment
useMotif # whether to use motifs (if false then just performs TICC until convergence)
The input data file should be a csv with one line per time step and each line having the sensor values for that step. This file can be PCA'd down if necessary. The output files will be a cluster
-> a list of primary cluster labels given per time step, cluster_MRFs
-> the inverse covariance matrices learned, motifs
-> the motifs found as well as their identified instances, motifsRanked
-> the scores for each motif.
The code from the paper is in the directory paper_code
. To run a script, put the script in the main directory.
The scripts for the synthetic experiments are in paper_code/scripts/synthetic
. baseline.py
contains the script for running the baselines, while synthetic.py
contains the script for running MASA.
The synthetic data can be found in ordered_synthetic.zip
. You need to unzip that file and put it in the main directory. The script that was used to generate that data is found in paper_code/generateDatasets/generate_synthetic.py.
The cycling data can be found in cycling.zip. The script to create the cycling data is in scripts/cycling/create_cycling_dataset.py
and the script to run the cycling data with MASA is in scripts/cycling.py
. The actual cycling data is in cycling.zip
.
Unfortunately we cannot release the datasets for the automobile and airplane data. The scripts that were used to run MASA on this data can be found in paper_code/scripts/runCaseStudy.py
.
Aggregation and plotting scripts can be found in paper_code/scripts/aggregation_and_plotting