Pinned Repositories
MaPeT
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
grid-feats-vqa
Grid features pre-training code for visual question answering
bottom-up-attention.pytorch
A PyTorch reimplementation of bottom-up-attention models
mcan-vqa
Deep Modular Co-Attention Networks for Visual Question Answering
RUArt
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
DLrook's Repositories
DLrook/grid-feats-vqa
Grid features pre-training code for visual question answering