tiny-torch

A small algorithm for exploring MLOps best practices.

Code-Generator

This is the image classification template by Code-Generator using a resnet model and the cifar10 dataset from TorchVision, and training is powered by PyTorch and PyTorch-Ignite.

Installation

Navigate to the tiny-torch directory, create a conda environment, and activate it.

conda env create -f environment.yaml

Install the library with pip.

pip install -e .

Usage

Run on single GPU

CUDA_VISIBLE_DEVICES=0 python tiny_torch/main.py configs/resnet50.yaml

Run on single node and multiple GPUs

torchrun --nproc_per_node=2 tiny_torch/main.py configs/resnet50.yaml --backend=nccl

Computational profiling with Tensorboard

Be sure torch_tb_profiler is installed in the environment and use the profiling config file.

torchrun --nproc_per_node=2 tiny_torch/main.py configs/profile_resnet50.yaml --backend=nccl

From the tiny-torch directory, start Tensorboard.

tensorboard --port=8667 --logdir=logs

In Nebari, navigate to https://<your nebari domain>/user/<your username>/proxy/8667/ (include the trailing slash) to view Tensorboard. Click on the PYTORCH_PROFILER tab.