A Pytorch implemention for some state-of-the-art models for "Temporally language grounding in untrimmed videos"
- Python 2.7
- Pytorch 0.4.1
- matplotlib
- The code is for Charades-STA dataset.
- TALL: Temporal Activity Localization via Language Query
- MAC: MAC: Mining Activity Concepts for Language-based Temporal Localization.
- A2C: Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos.
Methods | R@1, IoU0.7 | R@1, IoU0.5 | R@5, IoU0.7 | R@5, IoU0.5 |
---|---|---|---|---|
TALL | 8.63 | 24.09 | 29.33 | 59.60 |
MAC | 12.31 | 29.68 | 37.31 | 64.14 |
A2C | 13.98 | 32.18 | None | None |
- visual features
- visual activity concepts (for MAC)
- ref_info
- RL_pickle (for A2C)
Training and Testing for TALL, run
python main_charades_SL.py --model TALL
Training and Testing for MAC, run
python main_charades_SL.py --model MAC
Training and Testing for A2C, run
python main_charades_RL.py