/VisionTransformer

An Implementation of a Vision Transformer

Primary LanguagePython


In this repository, I Implement a vision transformer inspired by the paper An Image is Worth 16x16 Words.

It trains on the MNIST Dataset but can easily be adapted to any other custom dataset.


Project Structure

├─src
  ├── HelperFunctions.py
  ├── PatchEmbedding.py
  ├── Train.py
  ├── TransformerBlock.py
  ├── VisionTransformer.py
├─ main.py
├─README.md

Happy Coding!