Base code to test out how vision transformers work in comparison to a normal cnn For this version i am using huggingfaces implementation - therefore pytorch
For this module i am using the ASL dataset to test out, you can download it here
Model Name | Accuracy | MCC |
---|---|---|
ViT (Vision Transformers - 1 Epoch) | 1 | 0.99 |
EffnetV2 B2 (2 Epochs) | 1 | 0.99 |
CNN (10 Epochs) | 0.35 | 0.33 |
- EDA
- Train ViT model
- Evaluate ViT model
- Train & Evaluate Simple CNN
- Train & Evaluate EffNetV2
- Train & Evaluate DEIT
- Inference Script for Effcient Net
- Inference Script for ViT