/MML

Multi-faceted Video Moment Localizer

Primary LanguagePython

Multi-Faceted Moment Localizer

A PyTorch implementation of Multi-Faceted Moment Localizer, an improved version of MAC with additional features for better moment localization. Newly introduced features include:

Architecture Diagram

Training the model

  1. Download "Object Segmentation Features" and "Video Understanding Features" (Video caption features) from the following link and extract to ./data directory.
  2. Download c3d visual features, c3d visual activity concepts, ref_info provided by authors of MAC and extract to ./data directory.
  3. Now, the data directory should have following sub directories: all_fc6_unit16_overlap0.5, clip_object_features_test, clip_object_features_train, ref_info, test_softmax, train_softmax, video_understanding_features_test, video_understanding_features_train.
  4. Create a Python 2 Conda environmnt with pytorch 0.4.1 and torchvision. Additionally, install following dependencies using pip.
    • pip install pytorch_pretrained_bert
    • pip install numpy pickle
  5. Start training with python train.py

Testing Pretrained Model

  1. Follow steps 1-4 of "Training the model" section.
  2. Download the pre-trained model and place it in ./checkpoints directory:
    • Pre-trained model with BERT Sentence Embeddings, Glove VO Embedding, Object Segmentation Features and Video Captioning Features: Link.
  3. Final Output:
     [dr-obj 0.50][dr-act 0.00][sr 0.005][csr: 0.005] best_R1_IOU5: 0.319
     [dr-obj 0.50][dr-act 0.00][sr 0.005][csr: 0.005] best_R5_IOU5: 0.652

Contributors

Notes

This is not the code we used to identify the hyper-parameters used in the model. This a simplified version of the code released to the public.

Code related to feature extraction of Object Segmentation and Video Captioning will be released in the future.

Acknowledgements

Code improved from PyTorch implementation of MAC available here.