Project Template

This project template is for deep-learning researchers who want to use multi-gpu with pytorch Distributed Data Parallel(DDP).

Pre-requsites

You can use this template by installing dependencies via Anaconda with requirements.yaml. Additionally, this project-template uses configuration manager framework, Hydra. If you are not familier with Hydra, please check this Hydra tutorial docs.

conda env create --file requirements.yaml

Folder tree

Project-Name/
├── configs/ # Hydra configuration files goes here
│   ├── data_loader/ # data_loader configs
│   ├── dataset/ # dataset configs
│   ├── log_dir/ # directory configs to save all logs during training
│   ├── logger/ # visualization tool configs
│   ├── model/ # model configs
│   └── default.yaml # main config
│
├── data/ # all dataset goes here
│
├── logs/ # all logs goes here
│
├── src/ # source codes goes here
│   ├── dataloaders/ 
│   ├── datasets/
│   ├── models/
│   ├── utils/ # util functions for multi-gpu
│   └── train.py 
│
└── run.py # you can train the model by run this code

How to run

This project use multi-gpu by using elastic launch (torchrun), if not familiar with Torchrun, please check this documenation.

torchrun --nproc_per_node <num_gpu> run.py

Acknowledgement

This project-template is inspired by the project Pytorch-Lightning-Template and Pytorch-elastic-examples.

License