/Vision-Transformer

This notebook, which is part of the EPFL's Visual Intelligence course assignments, implements a vision transformer for classification as well as a GPT model for image generation.

Primary LanguageJupyter Notebook

This notebook is for assignment 1 of the CS-503 Visual Intelligence course at EPFL by Prof. Amir Zamir.

The goals of this assignment are to:

  • Implement a Vision Transformer for MNIST classification
  • Implement a GPT decoder model for image generation

Topics covered in this assignment:

  • Self-attention
  • Basic tokenization
  • Basic positional encodings
  • Transformer encoder-only (e.g. ViT) and decoder-only (e.g. GPT) models
  • Vision Transformer (ViT)
  • Supervised training
  • Autoregressive modelling