/faserDL

Deep-learning reconstruction for FASER

Primary LanguagePython

Minkowski Engine UNet Training Script

This script trains a Minkowski Engine UNet model using a dataset of sparse data. The script supports various configurations via command-line arguments.

Requirements

  • Python 3.7+
  • PyTorch
  • MinkowskiEngine (version with depthwise convolutions: )
  • tensorboardX
  • torchvision
  • tqdm
  • numpy
  • argparse

Usage

python -m train.train [args]

Arguments

  • --train: Enables the training mode. Set this flag when you want to train the model. This is a boolean flag (default: True).

  • --test: Enables the testing mode by disabling training. If this flag is set, the script will run in test mode (overrides --train flag).

  • -d, --dataset_path: Specifies the path to the dataset. This should include the path pattern to the training data. The default path is /scratch/salonso/sparse-nns/faser/events_v3_new.

  • --eps: A small constant value to prevent division by zero during calculations (default: 1e-12).

  • -b, --batch_size: Defines the number of samples per batch during training (default: 2).

  • -e, --epochs: The number of epochs (full training cycles) to run during training (default: 50).

  • -w, --num_workers: Number of worker threads to use for loading data during training (default: 16).

  • --lr: Learning rate for the optimizer. This controls the step size at each iteration while moving towards the optimal solution (default: 1e-4).

  • -ag, --accum_grad_batches: Specifies the number of batches to accumulate for gradient accumulation, which can help reduce memory usage by simulating larger batches (default: 1).

  • -ws, --warmup_steps: The number of warmup steps used to gradually increase the learning rate at the start of training (default: 0).

  • -wd, --weight_decay: Weight decay applied to the optimizer to prevent overfitting by penalizing large weights (default: 0.05).

  • -b1, --beta1: The first beta value for the AdamW optimizer, which controls the momentum for the moving average of the gradient (default: 0.9).

  • -b2, --beta2: The second beta value for the AdamW optimizer, controlling the momentum for the moving average of the squared gradient (default: 0.999).

  • --losses: Specifies the list of loss functions to be used during training. Options include "focal" and "dice" (default: ["focal", "dice"]).

  • --save_dir: Specifies the directory where logs and other outputs will be saved (default: /scratch/salonso/sparse-nns/faser/deep_learning/faserDL).

  • --name: Defines the name of the model version being trained or tested (default: "v1").

  • --log_every_n_steps: Number of steps between logging training information (default: 50).

  • --save_top_k: Number of top checkpoints to save during training (default: 1).

  • --checkpoint_path: Directory path where model checkpoints will be saved (default: /scratch/salonso/sparse-nns/faser/deep_learning/faserDL/checkpoints).

  • --checkpoint_name: Defines the name for the checkpoint file to be saved (default: "v1").

  • --load_checkpoint: The name of a specific checkpoint file to load for resuming training or testing (default: None).

  • --gpus: List of GPU device indices to use for training. If more than one GPU is specified, the script will use parallel processing (default: [0]).