This repository is focused on applying graph neural networks (GNNs) to the task of charged particle track reconstruction using the high-pileup TrackML dataset:
- TrackML @ Kaggle: https://www.kaggle.com/c/trackml-particle-identification
- TrackML @ Codalab: https://competitions.codalab.org/competitions/20112 TrackML data is a 3D point cloud of tracker hits with associated truth information about the particles that generate them. The goal of GNN-based tracking workflows is to embed track hits as graph nodes and apply GNNs to cluster hits belonging to the same particle. This repo focuses on two compelemtary strategies: edge classification to predict hit associations and object condensation to cluster hits and predict track properties.
Base directory: graph_construction/
. TrackML provides several truth quanities about the particles producing track hits in each events, for example transverse momentum, vertex, and charge. Each track hit is uniquely associated with a particle ID, so that we can calculate additional information about each track at truth level.
measure_particle_properties.py
produces a dataframe of truth information corresponding to each particle, including transverse momentum, charge, transverse impact parameter, number of hits, number of layers hit, and whether the particle skips a layer. Particles that produce hits in three or more layers, do not skip a layer, and follow a physical trajectory are labeled as reconstructable.
Example usage:
python measure_particle_properties.py -i /trackml_data/train_1 -o particle_properties --n-workers=3
slurm/measure_particle_properties.{py,slurm}
are provided to produce particle dataframes as set of batch jobs via Slurm.
The following scripts build tracker hit graphs from a set of TrackML event files and corresponding particle property dataframes.
build_graphs.py
produces graphs containing track hits embedded as nodes with features (r, phi, z, u, v), where u and v are coordinates in conformal space, and edge features (dr, dphi, dz, dR), where dR is the hit-hit distance in eta-phi space. Hits are assigned to particle IDs and the track parameters belonging to that particle ID at truth level. Edges are drawn via a set of geometric selections specified inconfigs/build_graphs.yaml
.
Example usage:
python train_TCN.py -i graphs/train1_ptmin0p8/ --n-train=10000 --n-test=2000 --learning-rate=0.0001
slurm/build_graphs_job.{py,slurm}
are provided to produce graphs via a set of batch jobs via Slurm.slurm/optimize_graph_construction_params.{py,slurm}
are provided to submit batch jobs corresponding to different geometric selections and report the corresponding graph construction efficiency (n_true/n_true_possible edges) and purity (n_true/n_total edges).
The graph construction routine employs several functions available in utils/graph_building_utils.py
and utils/hit_processing_utils.py
(see below).
Two training scripts, train_IN.py
and train_TCN.py
, are located at the head of the directory. train_IN.py
focuses on training IN-based edge classification architectures with no learned track finding step. train_TCN.py
focuses on the object condensation approach, optionally including track parameter predictions. Each script may be run stand-alone or through a batch job.
Example usage:
python build_graphs.py /configs/build_graphs.yaml --n-workers=3
Additionally, hyperparameter job arrays may be submitted using the scripts in hyperparameter_scans
.
hit_processing_utils.py
contains a set of helper functions for opening TrackML events.graph_building_utils.py
contains functions that select hits given a set of truth cuts, select edges given a set of geometric cuts, split the detector into multiple sectors, and correct edge truth levels in the case that multiple barrel-endcap layer connections are possible.data_utils.py
contains a set of functions for loading in graphs, building train/test/validation partitions, organizing graph datasets and dataloaders, and a custom GraphDataset extension of the PyTorch Geometric Dataset class.inference_utils.py
contains functions relevant to training and testing various GNN algorithms.