/MLP-Mixer-Reimplementation

Implementation of the MLP-Mixer architecture

Primary LanguagePython

MLP Mixer Implementation

mlp mixer architecture

Overview

This repository contains a PyTorch implementation of the MLP-Mixer architecture, as introduced in the paper "MLP-Mixer: An All-MLP Architecture for Vision". MLP-Mixer is a variant of the Transformer model, where the attention layer is replaced by a multi-layer perceptron (MLP) that operates across image patches or tokens in NLP tasks.

Our implementation of the MLP-Mixer serves as a drop-in replacement for the self-attention layer in standard Transformer models, requiring only a few lines of code to transpose the tensor and apply the MLP across the sequence dimension.

Currently, I have experimented with applying the MLP-Mixer to MNIST and CIFAR10 datasets for training.

Usage

  1. Install the necessary python package with requirements.txt
pip install -r requirements.txt
  1. You can adjust the hyperparameters in config.py
  2. You can run the training script with train.py
python train.py

References