This project template is for deep-learning researchers who want to use multi-gpu with pytorch Distributed Data Parallel(DDP).
You can use this template by installing dependencies via Anaconda with requirements.yaml
. Additionally, this project-template uses configuration manager framework, Hydra. If you are not familier with Hydra, please check this Hydra tutorial docs.
conda env create --file requirements.yaml
Project-Name/
├── configs/ # Hydra configuration files goes here
│ ├── data_loader/ # data_loader configs
│ ├── dataset/ # dataset configs
│ ├── log_dir/ # directory configs to save all logs during training
│ ├── logger/ # visualization tool configs
│ ├── model/ # model configs
│ └── default.yaml # main config
│
├── data/ # all dataset goes here
│
├── logs/ # all logs goes here
│
├── src/ # source codes goes here
│ ├── dataloaders/
│ ├── datasets/
│ ├── models/
│ ├── utils/ # util functions for multi-gpu
│ └── train.py
│
└── run.py # you can train the model by run this code
This project use multi-gpu by using elastic launch (torchrun), if not familiar with Torchrun, please check this documenation.
torchrun --nproc_per_node <num_gpu> run.py
This project-template is inspired by the project Pytorch-Lightning-Template and Pytorch-elastic-examples.
This project is licensed under MIT License.