Reproducible Deep Learning
PhD Course in Data Science, 2021, 3 CFU
This practical PhD course explores the design of a simple reproducible environment for a deep learning project, using free, open-source tools (Git, DVC, Docker, Hydra, ...). The choice of tools is opinionated, and was made as a trade-off between practicality and didactical concerns.
Local set-up
The use case of the course is an audio classification model trained on the ESC-50 dataset. To set-up your local machine (or a proper virtual / remote environment), configure Anaconda, and create a clean environment:
conda create -n reprodl; conda activate reprodl
⚠️ For an alternative setup without Anaconda, see issue #2.
Then, install a few generic prerequisites (notebook handling, Pandas, …):
conda install -y -c conda-forge notebook matplotlib pandas ipywidgets pathlib
Finally, install PyTorch and PyTorch Lightning. The instructions below can vary depending on whether you have a CUDA-enabled machine, Linux, etc. In general, follow the instructions from the websites.
conda install -y pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch -c conda-forge
conda install -y pytorch-lightning -c conda-forge
This should be enough to let you run the initial notebook. More information on the use case can be found inside the notebook itself.
⚠️ For Windows only, install a backend for torchaudio:pip install soundfile
Additional set-up steps
The following steps are not mandatory, but will considerably simplify the experience.
- If you are on Windows, install the Windows Subsystem for Linux. This is useful in a number of contexts, including Docker installation.
- We will use Git from the command line multiple times, so consider enabling GitHub access with an SSH key.
- We will experiment with Docker reproducibility on the Sapienza DGX environment. If you have not done so already, set-up your access to the machine.
Organization of the course
The course is split into exercises (e.g., adding DVC support). The material for each exercise is provided as a Git branch. To follow an exercise, switch to the corresponding branch, and follow the README there. If you want to see the completed exercise, add _completed to the name of the branch. Additional material and information can be found on the main website of the course.
List of exercises:
- Experimenting with Git, branches, and scripting (exercise1_git).
- Adding Hydra configuration (exercise2_hydra).
- Versioning data with DVC (exercise3_dvc).
- Creating a Dockerfile (exercise4_docker).
- Experiment management with Weight & Biases (exercise5_wandb).
- Unit testing and formatting with continuous integration (exercise6_hooks).
An example
If you want to follow the first exercise, switch to the corresponding branch and follow the instructions from there:
git checkout exercise1_git
If you want to see the completed exercise:
git checkout exercise1_git_completed
You can inspect the commits to look at specific changes in the code:
git log --graph --abbrev-commit --decorate
If you want to inspect a specific change, you can checkout again using the ID of the commit.
Contributing
Thanks to Jeroen Van Goey for the error hunting. Feel free to open a pull request if you have suggestions on the current material or ideas for some extra exercises (see below).
⚠️ Because of the sequential nature of the repository, changing something in one of the initial branches might trigger necessary changes in all downstream branches.
Extra material (students & more)
Extra branches contain material that was not covered in the course (e.g., new libraries for hyper-parameter optimization), implemented by the students for the exam. They can be read independently from the main branches. Refer to the original authors for more information.
Author | Branch | Content |
---|---|---|
OfficiallyDAC | extra_optuna | Fine-tuning hyper-parameters with Optuna. |
Advanced reading material
If you liked the exercises and are planning to explore more, the new edition of Full Stack Deep Learning (UC Berkeley CS194-080) covers a larger set of material than this course. Another good resource (divided in small exercises) is the MLOps repository by Goku Mohandas. lucmos/nn-template is a fully-functioning template implementing many of the tools described in this course.