This repository contains all the scripts used to derive the results presented in the paper "High-dimensional and Permutation Invariant Anomaly Detection".
We use the Top Quark Tagging Reference Dataset and a Z' dataset that will be available soon.
To preprocess the top tagging data you can run:
python preprocess.py --folder path/to/saved/file
To train the diffusion model you can run:
python train.py --dataset [top_tagging/gluon_tagging] [--ll] [--sup] --data_path path/to/preprocessed/files
where the flag --ll
trains the model using MLE and --sup
instead trains the supervised classifier.
To test the permutation invariance of the likelihood estimation you can run:
python plot_perm.py --data_folder path/to/preprocessed/files --nshuffle N
for N different permuations of the input particles
To generate samples containing the likelihood information you can run
python plot_ll.py --sample [--ll] --nidx X
to sample from the model using MLE training (--ll) and for a given partition X of the data (the default option is to break the sample down into 200 slices. In this case, X should be a number between 0 and 199). This last number is used to generate samples in parallel where different jobs can be run at the same time for different values of the partition index. After generating a few samples, you can merge the generated files using
python merge [--ll] --maxidx M
assuming you run M slices from 0 to M-1.
Plots can then be generated from the samples using
python plot_ll.py [--sup]
where the --sup
flag loads the supervised classifier and compares with the direct density estimation methods.