Pytorch implementation of the MicroNet Challenge based on Pytorch Lightning.
Features | Status | Type |
---|---|---|
Fix validation accuracy computation issue | DONE | Bugfix |
Add current learning rate in terminal | TO DO | Bugfix |
Add test method in model & check best model | DONE | Feature |
Add loss and acc to Tensorboard | DONE | Feature |
Terminal size and cursor issue | DONE | Bugfix |
Add best train acc and best val acc in terminal | DONE | Feature |
No more fastai dependancy | DONE | Feature |
- tensorboard logs: must define a tensorboard logger object (callback) and add scalar to it
-
learning rate logging: the tensorboard logger is correct. the lr displayed in terminal is false: it's constant at the initial lr.
-
accuracy computation issue:
- solved by calculating our own accuracy.
- is calculated on CPU and not GPU: need to be changed in future releases.
- improve README
- improve terminal display:
- table showing training logs
- make verbose a Lightning callback (see utils/verbose.py) instead of a decorator: this callback is based on a State class that could be use somewhere else.
- remove fastai dependancy: code loaded from fastai is now in pytorch
- test in model and main: for now it's only a duplicate of a validation routine on one epoch.
To use this project, first clone the repo on your device using the command below:
git init
git clone https://github.com/the-dharma-bum/MicroNet
Note that this projet requires fastai and pytorch lightning.
To ensure everything run ok, you could try:
apt install gcc git pip
pip install fastai
pip install pytorch-lightning
You can modify any hyper parameters in the config.py file. Alternatively, you can declare dataclasses (refer to config.py to see how they should be instanciated) anywhere in your code, then instanciate a model object using those dataclasses, and finally give them to a trainer object.
Once you're ready, run:
python main.py
This command supports many arguments, type
python main.py -h
to see them all, or refer to the pytorch-lightning documentation.
Most useful ones:
--gpus n
: runs the training on n gpus--distributed_backend ddp
: use DistributedDataParallel as backend to train across multiple gpus.--fast_dev_run True
: runs one training loop, that is one validation step, one test step, one training step on a single data batch. Used to debug efficiently.
If you want to debug but the fast_dev_run option doesn't suit (for instance if you wanna check what's happening between two epochs) you can run:
--limit_train_batches i --limit_val_batches j --max_epochs k
i,j,k are of course three integers of your choice.
Feel free to use, modify, and share this code. Consider citing us if you feel like it.