This is a Python code snippet that demonstrates how to use the Vision Transformer (ViT) model for image classification using the transformers library. The ViT model was introduced in the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" and achieved state-of-the-art results on the ImageNet dataset.
Jupyter notebook
Python 3.x
PyTorch
Transformers
Pillow
OpenCV
Matplotlib
TQDM
- Clone the repository:
https://github.com/Michael-Rusu/DetectAnythingModel.git
-
Navigate to the directory
-
Collect some images to test the model on and store them in a directory.
-
Update the randomPath variable in the code to point to the directory containing the images.
-
Open up vitmodel.ipynb and run the Jupyter notebook
The code will output a randomly selected image from the directory and the predicted class label.
This code is based on the Google VIT Model.