In this repository, I Implement a vision transformer inspired by the paper An Image is Worth 16x16 Words.
It trains on the MNIST Dataset but can easily be adapted to any other custom dataset.
Project Structure
├─src
├── HelperFunctions.py
├── PatchEmbedding.py
├── Train.py
├── TransformerBlock.py
├── VisionTransformer.py
├─ main.py
├─README.md