An opinionated general purpose model trainer on PyTorch with a simple code base.
From Github:
git clone https://github.com/coqui-ai/Trainer
cd Trainer
make install
From PyPI:
pip install trainer
Prefer installing from Github as it is more stable.
Subclass and overload the functions in the TrainerModel()
See the test script here training a basic MNIST model.
$ python -m trainer.distribute --script path/to/your/train.py --gpus "0,1"
We don't use .spawn()
to initiate multi-gpu training since it causes certain limitations.
- Everything must the pickable.
.spawn()
trains the model in subprocesses and the model in the main process is not updated.- DataLoader with N processes gets really slow when the N is large.
- Tensorboard - actively maintained
- ClearML - actively maintained
- MLFlow
- Aim
- WandDB
To add a new logger, you must subclass BaseDashboardLogger and overload its functions.