/Multi-Modal-Transformer

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.

Reading list in Transformer

This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.

Topics (paper and code)

Review Paper in multi-modal

Tutorials and workshop

Datasets

Blogs

Tools