Official Code Implementation of the paper : Video and Text Matching with Conditioned Embeddings
https://arxiv.org/abs/2110.11298
We employ the following datasets in our work:
- Acitivtynet Captions, the pre-extracted features can be downloaded by clicking here.
- Didemo , the pre-extracted features can be downloaded by clicking here
- Vatex click here.
- MSR-VTT can can be downloaded by clicking here
- YouCook2 . the preextracted features can be downloaded here
- LSMDC click here
Example training command on Activitynet :
python train.py anet_precomp --feat_name i3d --img_dim 2048 --norm