/ethz-clustering

Clustering with Iteratively Reweighted Descent

Primary LanguageC++

Clustering with Iteratively Reweighted Descent

Folder Structure

The project source code is saved under src folder. You can find all relevant .cpp and .h files there.
Before running the algorithm you have to put the csv files for configuration in the folder config. Please use the given template to fill the configuration files there.
If you want to use the data you have created, please put them in the data folder and follow the instructions in Section Run the Code.
The data that is presented in the report can be found in the folder cases.

Run the Code

  • Build Project:
    Use the following command lines to build the project:

    mkdir build
    cd build
    cmake --configure ..
    cmake --build <path-to-build-file> --target all

  • Create Data: Change the variances.csv and centers.csv files according to your choices and use the command line
    ./build/clustering 0
    The data will be saved in the data folder as csv files.

  • Run Algorithm To run the algorithm you need to specify the configuration parameters located in config folder. After choosing the parameters you want to run the algorithm with, run the command ./build/clustering 1
    This command will generate several csv files containing the output of the algorithm.

  • Run Algorithm for Different Parameters: You can also run the algorithm for several times with different rZ parameters. Please specify the set of rZ parameters in the run_hyp.sh in the paramRz array and how many times you want to run a configuration set repeatedly in ARG_NR_EXPERIMENT. To start the loop, use the command:
    ./run_hyp.sh
    This command will create an experiments folder with several experiment(number) folders which contains the outputs of repeated runs with the same configuration.

  • Computing Scores: To compute the score of all runs, use the command line:
    python compute_scores.py --path <path-to-experiments folder>
    This will generate a result.csv file containing all the L2 scores of outputs for every run.
    If you also want to see the mean score of every experiment folder, type:
    python plot_scores.py --path <path-to-result.csv-file>

  • Plotting Variables: If you want to inspect the sparsity of the variables z and s_z use the command line:
    python plot_experiment.py --path <path-to-run-folder>
    This will generate several .png files containing the variables as bar plots.

Code Structure

The code mainly consists of 3 objects: Observation, Data and Trainer. Observation object can sample the y observations as normal distributions specified in the config folder and saves the data in the data folder. Data object allocates and initializes the variables that are used in the training procedure. Trainer implements the training loop containing variance update and x estimation steps.