/ViT_PyTorch

This is a simple PyTorch implementation of Vision Transformer (ViT) described in the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"

Primary LanguagePython

Watchers