/numpy_transformer

transformer which using numpy,vision transformer of VIT, MNIST testset precision = 97.2%,mutil-attention, patch embed, position embed, full connect, convolution, etc. train normally, save model, restore model

Primary LanguagePythonMIT LicenseMIT

Transformer in numpy

VIT vision transformer

I write a VIT network in numpy fully, including forward and backpropagation.
including those layers, multi attention, PatchEmbed, Position_add, convolution, Fullconnect, flatten, Relu, layer_norm, Cross Entropy loss and MSE loss
In training, it use cpu and slowly, so I use different settings

Training it with MNIST dataset, it’s precision can reach to 97.2%, it's setting is

    epoch = 36
    batchsize = 100
    lr = 0.001
    embed_dim = 96
    images_shape = (batchsize, 1, 30-2, 30-2)
    n_patch = 7
    patchnorm = True
    # [0, 0, 0], [0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 0], [1, 1, 1]
    fixed     = 1 #False
    cls_token = 0 #True
    num_h = [2*2] * 6 #[3, 6, 12, 3, 6, 12]
    patch_convolu = 0 #False

this codes provide functions to save model and restore model to train
you can find those models in model dir

Train with command

python transformer_of_image.py

predict

python predict.py

precision

train in MacBook Pro 2020 Intel

classes precision
0 0.9908163265306122
1 0.9903083700440528
2 0.9748062015503876
3 0.9831683168316832
4 0.9674134419551935
5 0.9708520179372198
6 0.9739039665970772
7 0.9630350194552529
8 0.9517453798767967
9 0.9544103072348861
all precision 0.972

gpt in numpy

now I'm training the model

blogs

numpy实现VIT vision transformer在MNIST-https://zhuanlan.zhihu.com/p/645326689

总共实现了这几个层:

numpy实现vision transformer图像输入的patch-https://zhuanlan.zhihu.com/p/645318207

numpy实现vision transformer的position embedding-https://zhuanlan.zhihu.com/p/645320199

numpy实现multi-attention层的前向传播和反向传播-https://zhuanlan.zhihu.com/p/645311459

全连接层的前向传播和反向传播-https://zhuanlan.zhihu.com/p/642043155

损失函数的前向传播和反向传播-https://zhuanlan.zhihu.com/p/642025009

Reference

https://github.com/google-research/vision_transformer/blob/main/vit_jax/models_vit.py
https://github.com/UdbhavPrasad072300/Transformer-Implementations/blob/main/notebooks/MNIST%20Classification%20-%20ViT.ipynb
https://github.com/s-chh/PyTorch-Vision-Transformer-ViT-MNIST/tree/main
https://itp.uni-frankfurt.de/~gros/StudentProjects/WS22_23_VisualTransformer/
https://jamesmccaffrey.wordpress.com/2023/01/10/a-naive-transformer-architecture-for-mnist-classification-using-pytorch/
https://medium.com/mlearning-ai/vision-transformers-from-scratch-pytorch-a-step-by-step-guide-96c3313c2e0c
https://github.com/BrianPulfer/PapersReimplementations/blob/main/vit/vit_torch.py
https://github.com/microsoft/Swin-Transformer
https://huggingface.co/docs/transformers/v4.27.0/model_doc/vit