My customised deep learning project template

Contributer: Gengyuan Zhang

Contact: gengyuanmax@gmail.com

About this project

I create a deep learning project template to help me to start a new project with convenient logging and visualization supported by mlflow and tensorboard.

It includes:

  • training/testing/resuming a new task

  • saving all checkpoints and artifacts to a local directory including git commit version, config file copy, metrics etc.

  • generate a new experiment folder once submitting a new script

  • using Distributed Data Parallel to realisze one-node multi-gpu training

How to Use

  1. Define your model that inherits ConfigModel class
  2. Define your trainer in main.py
  3. Start mlflow server
mlflow ui -h 0.0.0.0 -p 5055
  1. Start tensorboard server'
tensorboard --host 0.0.0.0 --logdir mlruns/{run_id}

How it's like

Mlflow can help manage all the experiments and details. How mlflow can manage experiments

In each experiment, we can save all the artifacts and checkpoints