ViT Image Classification using Transformers

This is a Python code snippet that demonstrates how to use the Vision Transformer (ViT) model for image classification using the transformers library. The ViT model was introduced in the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" and achieved state-of-the-art results on the ImageNet dataset.

Prerequisites

Jupyter notebook
Python 3.x
PyTorch
Transformers
Pillow
OpenCV
Matplotlib
TQDM

Usage

  1. Clone the repository:

https://github.com/Michael-Rusu/DetectAnythingModel.git

  1. Navigate to the directory

  2. Collect some images to test the model on and store them in a directory.

  3. Update the randomPath variable in the code to point to the directory containing the images.

  4. Open up vitmodel.ipynb and run the Jupyter notebook

The code will output a randomly selected image from the directory and the predicted class label.

This code is based on the Google VIT Model.