/SC2020

Primary LanguagePythonMIT LicenseMIT

Supporting code for our paper submitted to SC2020

Setup

We use python3.6 for all our experiments. They will not work with earlier versions.

To run our scripts, we have packaged the dependencies in requirements.txt. To install them, you will need pip. Run pip install -r requirements.txt. It is recommended that you do this inside a virtual environment like virtualenv though.

Experiments

For each of the figures in paper, we included the script that has generated the figure. The scripts have been simplified to remove some of the command line parameters and options, but are otherwise very similar to what we use.

Here is the mapping between figures and scripts:

  • Figure 1 is generated by throughput_vs_size_hexplot.py. Simply run python throughput_vs_size_hexplot.py to create the figure.
  • Figure 2 is generated by distance_matrices.py. Simply run python distance_matrices.py to create the figure.
  • Figure 3 is generated by tree_breakdown.py. Simply run python tree_breakdown.py to create the figure.
  • Figure 4 is generated by local_vs_global_models.py. Simply run python local_vs_global_models.py to create the figure.
  • Figure 5 is generated by permutation_feature_importance.py. Simply run python permutation_feature_importance.py to create the figure.
  • Figure 6 is generated by dashboard.py. Simply run python dashboard.py to create the figure.

Aside from that, the root directory contains only two scripts:

  • dataset.py, which loads the (anonimized) dataset and contains the preprpocessing pipeline. Note that the pipeline is not used, since as we had to anonimize data, and as we are forced to keep files under 100MB. Instead, we preprocessed the data ahead of time and stored it in data/anonimized_io.csv
  • feature_name_mapping.py is just a map from features to human-friendly features.

Good luck!