/Masked-AutoEncoders

Pytorch implementation of Masked Autoencoder I

Primary LanguagePythonOtherNOASSERTION

MASKED AUTO ENCODERS

A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners)

  • The model is quite small with only 7 million parameters
  • The model was train on the CIFAR10 dataset
  • The model was train for 320 Epochs
  • We train the model on a single L4 GPU over the course of 8 hours

dogs

Examples

dogs

dogs

dogs

Setup

  • First git clone the repo
git clone https://github.com/dame-cell/Masked-AutoEncoders.git
cd Masked-AutoEncoders
pip install  -r requirements.txt

Usage

  • For training the model you can simply
python training_model.py --epochs 320 --lr 0.0001 --batch_size 128
  • For training the linear probe without the pretrained encoder
!python train_linear_probe.py --epochs 10  --lr 0.0001 --batch_size 128 --pretrained False 
  • For training the linear probe with the pretrained encoder
  • first download the pre-trained model from huggingface
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="damerajee/MAE", filename="model.pt",local_dir="model")
!python train_linear_probe.py --epochs 10  --lr 0.0001 --batch_size 128 --pretrained True --path_to_model your downloaded model from hugginface 

Hyper-parameters

Hyperparameter Description Value
epochs Number of training epochs 320
lr Learning rate 1e-4
batch_size Batch size for training/validation 128
weight_decay Weight decay for optimizer 1e-4
eval_interval Evaluation interval during training 100 steps
seed Random seed for reproducibility 42
mask_ratio Masking ratio for MAE 0.75
optimizer Optimizer AdamW
lr_scheduler Learning rate scheduler Cosine decay + warmup

Inference

For trying the pre-trained model you can head to this colab notebook

Train and val loss

The MAE was train for only 320 epochs by self-supervised training

MAE LOSS

The Linear probe training was done on two stages for only 10 epochs

  • training it without the pre-trained encoder
  • training it with the pre-trained encoder

Results

Model Train accuracy Val accuracy
Vanilla encoder 64.32% 60.27%
pre-trained encoder 96.13%, 82.35%

Reference

The code for the model was reference from IcarusWizard with a few modifications