codamin/Vision-Transformer

This notebook, which is part of the EPFL's Visual Intelligence course assignments, implements a vision transformer for classification as well as a GPT model for image generation.

Jupyter Notebook

This notebook is for assignment 1 of the CS-503 Visual Intelligence course at EPFL by Prof. Amir Zamir.

The goals of this assignment are to:

Implement a Vision Transformer for MNIST classification
Implement a GPT decoder model for image generation

Topics covered in this assignment:

Self-attention
Basic tokenization
Basic positional encodings
Transformer encoder-only (e.g. ViT) and decoder-only (e.g. GPT) models
Vision Transformer (ViT)
Supervised training
Autoregressive modelling