MobileViT is a light-weight and general-purpose vision transformer for mobile vision tasks. It combines the strength of the standard CNN and the ViT. It has outperforms several CNNs and ViT-based network across different tasks and datasets.
The block diagram of the MobileViT along with the TMobileViT block. |
For more follow me on: