2D-DDNet with optimizations
- Read Access to "/projects/synergy_lab/garvit217/enhancement_data/"
- Pytorch container 22.04
Run below commands:
module load containers/singulartiy
singularity pull pytorch22.04.sif docker://nvcr.io/nvidia/pytorch:22.04-py3
Other versions of the container can also be used, but driver compatibility needs to be checked with the driver version on GPU nodes at ARC. Carefully verify from documentation the follow:
- git checkout
- cd 2dnet
- source params.sh
- sbatch job_tinker.sh
- output will be generated in slurm-.out
- source params.sh
- export enable_profile="true"
- sbatch job_tinker.sh
Read carefully the comments in job_tinker.sh
Provide access to all users to Python file
chmod 766 sparse_ddnet.py
chmod 766 trainers.py
- batch size: 1
- learning rate: 0.0001
- epochs: 50
- decay rate: 0.95
- Mixed precision:
export mp=true
- DoLL Data Loader for Small Datasets
export new_load=true
- Graph Optimizations:
#to enable graph change pytorch version above otherwise, the below two parameters won't be respected
export gr_mode="reduce-overhead"
export gr_back="aot-eager"
export retrain=0 # should be >0 options for prune_t (prune type) mag, l1_struc or random_unstru (default) will be set otherwise
export prune_t="random_unstru"
export prune_amt=0.5
export model="ddnet" # choice ddnet, vgg16 (ddnet with vgg-16 based loss), vgg19 (ddnet with vgg-16 based loss)
Make the following changes in job_tinker.sh file
export MASTER_PORT=<some unique value>
And update below SLURM headers
#SBATCH --ntasks-per-node P
#SBATCH --gpus-per-node G
#SBATCH --nodes N
P: number of parallel process G: Number of GPUs per node
G=P
N: number of nodes