This repo includes all of the materials that you will need for the "Machine Learning Reproducibility with Open Source Tooling" workshop at Applied ML Days 2018. In this workshop, we will discuss the importance of reproducibility and data provenance in any applied machine learning workflow. We will then implement a realistic machine learning workflow, emphasizing these points and utilizing open source tooling to overcome the challenges associated with reproducibility.
- Introduction
- Reproducibility challenges
- Predictable application behavior with Docker
- Fully reproducible orchestration of ML workflows
- You will need to ssh into a cloud instance. Remind yourself of how to do that and install a client if needed:
- On a Mac or Linux machine, you should be able to ssh from a terminal (see these Mac instructions and Linux instructions).
- On a Windows machine, you can either install and use an ssh client (I recommend PuTTY) or use the WSL.
- You will also need to work a bit at the command line. If you are new to the command line or need a refresher, look through this quick tutorial.