This repository is organized as such :
GCNN/
|-- main.sh # stores model arguments and calls main.py
|-- models[dataset] # directory where trained networks for {NYU, NERSC, ICECUBE} data are saved
|-- README.md
|-- summarize.sh # summarizes trained batch array models
|-- script/ # python code
| |-- main.py # read arguments, load data from specified dataset, begin experiment
| |-- experiment_handler.py # trains, tests over each epoch; perform training plots and save scores
| |-- train_model.py # train, test model over one epoch
| |-- data_ops/ # currently just used for generating and zero-padding minibatches
| |-- graphics/ # code for ROC AUC, 1/FPR scoring and all plots
| |-- loading/
| | |-- data/ # load handling for all datasets
| | |-- model/ # all code for reading arguments, model save / restore, model global argument handling
| |-- model/ # contains model architectures and kernels
| |-- utils/ # small functions useful in different files
main.sh
contains commented out command lines that launch training on models with different architecture. You can change parameters used to initialize networks, and select the dataset using --data {NYU, NERSC, ICECUBE}
. --cuda
runs the network on a GPU.
To train a network, launch main.sh
with arguments defined as desired. Run python3 script/main.py --help
to see a list of arguments.
--name str
: network reference name--data {NYU, ICECUBE, NERSC}
: dataset to use--cuda
: flag to train on GPU--nbtrain int
: number of training samples to use--nbtest int
: number of testing samples to use--nb_batch int
: minibatch size--nbprint int
: print frequency by batch (note that at this point changing minibatch size changes print frequency)
--fm int
: number of node feature maps at each layer--depth int
: number of layers in the network--lr float
: initial learning rate--lrdecay [0,1]
: exponential decay factor--nb_extra_nodes int
: number of zero-initialized nodes to add to experiment (see below)--readout {DTNN_, ''}+{Sum, Mean, Max}
: type of operation after all graph convolution layers for transforming points into prediction--node_type {Identity, GRU}
: method for updating points--conv_type {Simple, ResGNN}
: graph convolution method
--kernels str list
: type of kernels to use (see below)--combine_kernels str
: method for combining multiple kernels together at each layer (see below)--nb_MLPadj_hidden int
: only for use with MLP kernels. Number of hidden units to use
--save_best_model
: flag to save best model based on test 1/FPR--sorted_training
: flag to group similar-sized training samples (test does this by default). Minibatches of different sizes are padded with zeros so setting this flag significantly speeds up training. However, scores are not quite as high--quiet
: flag to reduce printing--no_shuffle
: flag to load and run samples in the same order. Good for plotting--plot {spectral, spectral3d, eig, ker}
: type of plotting to perform--tpr_target [0,1]
: set the TPR against which 1/FPR will be measures. Default is 0.5
Statistics will be saved after every epoch. Plots are updated if the network improves on its best (1/FPR) test score. If the current (1/FPR) score matches the previous best, plots are updated only if test AUC is improved upon.
Kernels by default are computed at the first layer only and saved for use in later layers. Optional tags may be used which change the behavior of kernels. Example kernels may be QCDAwareMeanNorm-first_only
MLPdirected-layerwise-no_first
.
-layerwise
: Instantiates a kernel of specified type at every layer. Each instantiation is applied to one layer only-no_first
: Kernel is instantiated at every layer except the first. Must be used with-layerwise
tag-first_only
: Kernel is used at the first layer only. May not be used with-layerwise
tag. Must have at least one additional kernel for remaining layers
Multiple kernels may be used at each layer and combined together. Methods for combining kernels are:
Fixed_Balanced
: Default option. Outputs average of all kernelsAffine
: Parameterized affine combination of kernels
QCDAwareMeanNorm
: Physics-inspired kernel which works well in practice. For use withNYU, NERSC
data only. Not for use with-layerwise
tagGaussianSoftmax
: Computes pairwise-distance based upon spatial coordinates when used in the first layer. In later layers, uses all featuresDistMult
: Parameterizes kernel which works well only withSimple
convolutions andGRU
nodesMLPdirected
: Learned kernel which uses an MLP to compute pairwise distances
There is currently an anomaly (in every network architecture tested) whereby appending zero-valued points to the sample input drastically speeds up training. I am working to understand why this is occuring with the hope of removing the need for its inclusion, but at present time - on the NYU dataset - using 30 extra nodes increases 1/FPR by 40-50%.