/DANCE

PyTorch code for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles (DANCE)

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

Shuquan Ye2,Yujia Xie1,Dongdong Chen1, Yichong Xu1, Lu Yuan1, Chenguang Zhu1, Jing Liao2

1Microsoft, 2City University of Hong Kong

This is the PyTorch code of the DANCE [paper]. The code is on PyTorch 1.11. Pre-training with ours code requires 4 nodes each with 8 A100 GPUs.

Catalog:

  • Code for DANCE-augmented Pre-training

  • Code for DANCE-augmented Fine-tuning

  • Code for Image-Text Retrieval, OK-VQA

  • Download of Pre-trained and Fine-tuned Checkpoints

BibTeX