Computer Vision Models implemented in PyTorch. Train and Test on CIFAR-10 dataset. To be updated...
- Auto-Encoder
- Variational Auto-Encoder
- ResNet
- Vision Transformer (ViT)
- MAE (TODO)
- Swin Transformer (TODO)
- Diffusion Model (TODO)
Model | Test Accuracy | Checkpoints | Training Log |
---|---|---|---|
ResNet-18 | 95.% | Checkpoints | Log |
vit_base_patch16_224 | 98.87% | Checkpoints | Log |
- Dataset Cutout
- Learning Rate Scheduler
- 7x7 Conv -> 3x3 Conv
- Remove MaxPool
- KaiMing_Normal for initialization
Used pre-trained checkpoints(vit_base_patch16_224) from TIMM.