pytorch-video-understanding

This codebase provides a comprehensive video understanding solution for video classification and temporal detection.

Key features:

Video classification: State-of-the-art video models, with self-supervised representation learning approaches for pre-training, and supervised classification pipeline for fine-tuning.
Video temporal detection: Strong features ready for both feature-level classification and localization, as well as standard pipeline taking advantage of the features for temporal action detection.

The approaches implemented in this repo include but are not limited to the following papers:

Self-supervised Motion Learning from Static Images
[Project] [Paper] CVPR 2021
A Stronger Baseline for Ego-Centric Action Detection
[Project] [Paper] First-place submission to EPIC-KITCHENS-100 Action Detection Challenge
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition
[Project] [Paper] Second-place submission to EPIC-KITCHENS-100 Action Recognition challenge
TAda! Temporally-Adaptive Convolutions for Video Understanding
[Project] [Paper] Preprint

Latest

[2021-10] Codes and models are released!

We include our pre-trained models in the MODEL_ZOO.md.

We include strong features for HACS and Epic-Kitchens-100 in our FEATURE_ZOO.md.

The general pipeline for using this repo is the installation, data preparation and running. See GUIDELINES.md.

This codebase is written and maintained by Ziyuan Huang, Zhiwu Qing and Xiang Wang.

If you find our codebase useful, please consider citing the respective work :).