Vision Transformer

Results

Task Acc.
FashionMNIST 80.51%

Hyperparameters Vit Base MNIST:

Parameters Values
input_size [1, 28, 28]
patch_size 7
embed_dim 8
n_enc_layer 2
n_head 2
head_dim 8
mlp_hid_dim 64
drop 0.1
n_classes 10
n_epochs 16
batch_size 512
seed 45
Total Time 1min 22sec
GPU Rtx Quadro 5000 (16GB)

References:

[1] LearnOpencv [Link]

[2] Vision Transformer from Scratch [Link]