/EMB

Pytorch Implementation of ECCV'22 paper: Video Activity Localisation with Uncertainties in Temporal Boundary

Primary LanguagePythonMIT LicenseMIT

Elastic Moment Bounding

Accepted by 17th European Conference on Computer Vision (ECCV2022)

Pytorch Implementation of Video Activity Localisation with Uncertainties in Temporal Boundary

Prerequisites

  1. Clone this repo: git clone https://github.com/Raymond-sci/EMB.git
  2. Download the pre-trained video features from here and word embeddings from here, then put them in data/features.
  3. Set up experimental environments using environment.yml

Usage

  1. Download our trained models from here and put them in sessions/, create the session folder if not existed.
  2. Run python main.py --task charades --mode test --model_name 20220715-202236 to evaluate a model. By default, the determined boundaries predicted by the model will be tested, use --elastic if want to test the elastic boundaries.
  3. Run python main.py --task charades --mode train to train on one of {charades, tacos, activitynet} benchmarks.
    • by default, no intermediate files (log, checkpoints and etc) will be stored on disk during training, use --deploy if needed
    • Use the option --model_dir to specify where the files generated by experiment sessions should be stored. The default path is sessions/.
    • Use the option --model_name to specify the session name. The timestamp will be used as the default session name.
    • More options can be found in main.py

License

This project is licensed under the MIT License. See LICENSE for more information

Citation

Please cite our paper if you found this project helpful:

@InProceedings{huang2022emb,
  title     = {Video Activity Localisation with Uncertainties in Temporal Boundary},
  author    = {Jiabo Huang, Hailin Jin, Shaogang Gong, Yang Liu},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
}

Acknowledgements

This implementation is heavily based on the excellent work VSLNet carried out with the paper Span-based Localizing Network for Natural Language Video Localization.