Implementation of the paper "Temporal-enhanced Cross-modality Fusion Network for Video Sentence Grounding" (ICME 2023 Oral).
Video features provided by VSLNet, and the word embedding.
The directory of /data/features should be like
data
├── activitynet
├── charades
├── tacos
├── glove.840B.300d.txt
# [dataset]: tacos, charades, activitynet
# [mode]: train, test
python main.py --task [dataset] --mode [mode]