Mass Agnostic Jet Taggers

Layne Bradshaw, Rashmish K. Misha, Andrea Mitridate, and Bryan Ostdiek

This project explores the benefits of different methods to decorrelate the jet mass from a machine learning jet tagger. The paper can be found at [https://arxiv.org/abs/1908.08959]. If you use any of the results from this study, please cite:

Getting started

For reproducibility, we have included the environment we used. To generate this environment (conda is required)


make create_environment
conda activate massagnosticjettaggers


All of the preprocessing of the data can be done with make data


The models have already been trained, and the training takes quite a bit of time. However, they can be retrained using

make base_nn
make uBoost
make BDT
make planed_nn
make planed_bdt
make pca_nn
make pca_bdt

Predictions and Metrics

The predictions for the test data can be made using make predictions. After the predictions are made, the metrics are computed with make metrics.

Project Organization

├── Makefile             <- Makefile with commands like `make data` or `make train`
├── README.md            <- The top-level README for developers using this project.
├── data
│   ├── interim          <- Intermediate data that has been transformed.
│   ├── modelprediction  <- The final, canonical data sets
│   └── raw              <- The original, immutable data dump.
├── models               <- Trained and serialized models and histories
│   ├── adv              <- Adversarial trained networks
│   └── histories        <- Adversarial training histories
├── notebooks            <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `01-bo-CheckScaling.ipynb`.
├── reports              <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures          <- Generated graphics and figures to be used in reporting
├── environment.yml     <- The requirements file for reproducing the analysis environment
├── setup.py             <- makes project pip installable (pip install -e .) so src can be imported
└── src                  <- Source code for use in this project.
    ├── __init__.py      <- Makes src a Python module
    ├── data             <- Scripts to download or generate data
    │   ├── make_dataset.py   
    │   ├── get_weights_1d.py  <- Planing
    │   ├── PCA_scaler.py      <- PCA rotation
    │   └── process_data.py    <- Runs the preprocessing
    ├── models           <- Scripts to train models and then use trained models to make
    │   │                 predictions
    │   ├── HelperFunctions.py
    │   ├── predict_model.py
    │   ├── train_Adversarial.py
    │   ├── train_base_nn.py
    │   ├── train_BDT.py
    │   ├── train_PCA_BDT.py
    │   ├── train_PCA_nn.py
    │   ├── train_planed_BDT.py
    │   ├── train_planed_nn.py
    │   └── train_uBoost.py
    ├── test_metrics     <- Scripts to take histograms and compute metrics
    │   ├── Distances.py
    │   └── run_metrics.py
    └── visualization    <- Scripts to create exploratory and results oriented visualizations

