ViT-for-multiclass-image-classification

This repository contains a complete pipeline for the processing of video data to classify individual frames according to custom labels. It contains various utilities in the form of Jupyter Notebooks for the convenient processing of data, the training of a Vision Transformer (ViT) model, and the inference of additional data.

https://github.com/djebel-amila/ViT-for-multiclass-image-classification/blob/main/ViT_multiclass_image_classification_training.ipynb