gnn_track_challenge: A Jupyter Notebook repository from savvy379

GNN Algorithms for Tracklet Finding in the CMS/ATLAS Pixel Detectors

Introduction

This code uses data from the CERN TrackML Challenge which can be downloaded from Kaggle. This repository includes the python library to easily read the data into a dataframe in the trackml-library folder. Upon first downloading this repository you must install this library by running

pip install /path/to/repository

Data

TrackML events are separated into four files: cells, hits, particles, and truth. The data folder contains event0000010000 from the TrackML dataset.

Visualization and Kinematic Studies

Several visualization scripts and kinematic studies are available in the visualization_scripts folder.

skim_data.py: produce skim event files (keeps only hits/truth in the pixel detector, removes all cell information, and adds detector/cylindrical coordinates to remaining truth hits)
plot_functions.py: contains generic functions for plotting histograms (block, binned), scatterplots (errorbars, overlayed plots), heat maps, tracks in 3D space, tracks overlapped with the modules they hit, and entire regions of the detector.
segment_detector.py: returns the detector dataframe where each hit has been sorted into one of N phi bins
analyze_tracks.py: analyze track quality vs. non-quality, pt per track distributions, number of hits per track distributions, number of layers hit per track distributions, track length distributions, dEta/dPhi/dR per track distributions

Data Structures

The goal of pre-processing the data is to create graphs, which are namedtuples of matrices X, Ri, Ro, and y. X is the feature vector, which contains the cylindrical position (r, phi, z) of each hit. Ri and Ro are segment matrices, each of which have nHits rows and nSegments columns. Element Ri_{hs} of Ri is 1 if segment s is incoming to hit h, and 0 otherwise. Likewise, element Ro_{hs} of Ro is 1 if segment s is outgoing from hit h, and 0 otherwise. y is the segment truth vector, which is a vector of length nSegments containing 0 entries for false segments and 1 entries for true segments. The graph.py file defines graphs and some corresponding loading/saving functions.

Measurements

We have defined several graph construction performance metrics, which are implemented in this folder. Among them are the segment efficiency, which is defined as sum(y)/len(y), and truth efficiency, which is the number of true segments selected divided by the total number of true segments contained in the dataset. In the latter case, it is necessary to calculate the correct number of truth segments for each pre-processing strategy. For example, in the layer pairs pre-processing scheme (in which only one hit per layer per particle is kept), the true number of hits per particle is simply nLayersHit-1.