A small algorithm for exploring MLOps best practices.
This is the image classification template by Code-Generator using a resnet
model and the cifar10
dataset from TorchVision, and training is powered by PyTorch and PyTorch-Ignite.
Navigate to the tiny-torch
directory, create a conda environment, and activate it.
conda env create -f environment.yaml
Install the library with pip.
pip install -e .
CUDA_VISIBLE_DEVICES=0 python tiny_torch/main.py configs/resnet50.yaml
torchrun --nproc_per_node=2 tiny_torch/main.py configs/resnet50.yaml --backend=nccl
Be sure torch_tb_profiler
is installed in the environment and use the profiling config file.
torchrun --nproc_per_node=2 tiny_torch/main.py configs/profile_resnet50.yaml --backend=nccl
From the tiny-torch
directory, start Tensorboard.
tensorboard --port=8667 --logdir=logs
In Nebari, navigate to https://<your nebari domain>/user/<your username>/proxy/8667/
(include the trailing slash) to view Tensorboard. Click on the PYTORCH_PROFILER tab.