For training and evaluating mass-decorrelated Hbb vs. Dijets tagger and Hbb vs. Top tagger.
- Clone the package:
$ git clone https://github.com/allenshihlung/mass_decorrelated_hbb_tagger.git
$ cd adversarial-master
- Install miniconda and other dependencies to create the appropriate software environments:
$ source install.sh
- Add path to .bashrc in your ~ directory
$ cd ~
$ cat "export PATH=<path to your the bin directory of your miniconda installation>:$PATH"
- Activate the conda environment
$ cd <path to your adversarial-master directory>
$ source activate.sh
- Gather all .h5 datasets into hbbDijetsDatasets/ or hbbTopDatasets/ On PDSF, this can be done by
$ python dataprocessing/getDatasets.py
Note that the name of each dataset has to follow the following format:
.h5 For example:
user.dguest.15830754._000001.output_301498_H.h5
and
user.dguest.15830705._000040.output_361027_N.h5
- Label the datasets
$ python dataprocessing/labelHbbDatasets.py
This will place the labelled datasets into labelledHbbDijetsDatasets/ or labelledHbbTopDatasets/
- Extract necessary columns for reweighting (we will add the other columns back after reweighting)
$ python dataprocessing/extractedPt.py
This will place the labelled datasets into extractedHbbDijetsDatasets/ or extractedHbbTopDatasets/
- Reweight and subsample
$ python -m prepro.reweightData --train <test events to subsample in millions> --test <test events to subsample in millions> --max-processes <max concurrent processes to be spun off>
This will place the processed extractedData.h5 file in reweightDatasets/
- Append all other columns back
$ python dataprocessing/appendHbbTop.py
This will place the processed data.h5 file in input/
Ready to go for training!
For training with the default configuration:
$ python -m run.adversarial.train --train
To make any changes to the configurations, locate the default.json file in the configs/.
Locate the perform_studies function in tests/comparison.py and comment out or uncomment any study. Then run
$ python -m tests.comparison
The plots will be saved in figures/