Vision Transformers Test

Base code to test out how vision transformers work in comparison to a normal cnn For this version i am using huggingfaces implementation - therefore pytorch

For this module i am using the ASL dataset to test out, you can download it here

Model Results

Model Name Accuracy MCC
ViT (Vision Transformers - 1 Epoch) 1 0.99
EffnetV2 B2 (2 Epochs) 1 0.99
CNN (10 Epochs) 0.35 0.33

ToDO

  • EDA
  • Train ViT model
  • Evaluate ViT model
  • Train & Evaluate Simple CNN
  • Train & Evaluate EffNetV2
  • Train & Evaluate DEIT
  • Inference Script for Effcient Net
  • Inference Script for ViT