/Region_Learner

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Primary LanguagePython

Region_Learner

The Pytorch implementation for "Video-Text Pre-training with Learned Regions" (arxiv)

We are still cleaning up the code further and preparing for pre-training weights.

Preparation

Overall, this code is built on PyTorch with DistributedDataParallel (DDP).

PS: Not all videos are avaible so that you need to modify the metadata depend on your case. We also provide our metadata in here.

Pre-training

  • Run sh pre-training.sh (Commands with different settings are listed in this script.)

Finetuning (on MSR-VTT)

Pre-trained Weights

WebVid2M + CC3M

Acknowledgements

This code is based off Frozen in Time

Citation

@article{yan2021video,
  title={Video-Text Pre-training with Learned Regions},
  author={Yan, Rui and Shou, Mike Zheng and Ge, Yixiao and Wang, Alex Jinpeng and Lin, Xudong and Cai, Guanyu and Tang, Jinhui},
  journal={arXiv preprint arXiv:2112.01194},
  year={2021}
}