video-understanding

There are 211 repositories under video-understanding topic.

open-mmlab/mmaction2
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Language:Python4.4k 42 1.4k1.3k
jinwchoi/awesome-action-recognition
A curated list of action recognition and related area resources
3.8k 207 9724
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
3.7k 141 28210
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Language:Python3.1k 37 238254
mit-han-lab/temporal-shift-module
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Language:Python2.1k 42 220416
open-mmlab/mmaction
An open-source toolbox for action understanding based on PyTorch
Language:Python1.9k 40 199350
PaddlePaddle/PaddleVideo
Awesome video understanding toolkits based on PaddlePaddle. It supports video data annotation tools, lightweight RGB and skeleton based action recognition model, practical applications for video tagging and sport action detection.
Language:Python1.6k 38 327386
yjxiong/temporal-segment-networks
Code & Models for Temporal Segment Networks (TSN) in ECCV 2016
Language:Python1.5k 42 283475
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.5k 27 20192
MCG-NJU/VideoMAE
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Language:Python1.4k 16 125137
yjxiong/tsn-pytorch
Temporal Segment Networks (TSN) in PyTorch
Language:Python1.1k 27 133310
TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
1k 29 598
PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Language:Python903 7 6644
yjxiong/action-detection
temporal action detection with SSN
Language:Python643 30 116177
Vision-CAIR/MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
Language:Python574 12 4261
OpenGVLab/VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Language:Python552 6 6064
henghuiding/MeViS
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Language:Python510 8 1422
yoosan/video-understanding-dataset
A collection of recent video understanding datasets, under construction!
460 23 679
chihyaoma/Activity-Recognition-with-CNN-and-RNN
Temporal Segments LSTM and Temporal-Inception for Activity Recognition
Language:Lua440 26 29147
MCG-NJU/TDN
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition
Language:Python374 10 7155
v-iashin/SpecVQGAN
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Language:Jupyter Notebook354 8 3539
movienet/movienet-tools
Tools for movie and video research
Language:C++282 11 4032
boheumd/MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Language:Python259 4 4128
JunweiLiang/Multiverse
Dataset, code and model for the CVPR'20 paper "The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction". And for the ECCV'20 SimAug paper.
Language:Python253 10 4562
SoccerNet/sn-gamestate
[CVPRW'24] SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap (CVPR24 - CVSports workshop)
Language:Python253 17 1350
whwu95/Cap4Video
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Language:Python249 9 3320
NVlabs/STEP
STEP: Spatio-Temporal Progressive Learning for Video Action Detection. CVPR'19 (Oral)
Language:Python248 18 2348
rlleshi/phar
deep learning sex position classifier
Language:Python243 8 1728
hustvl/TeViT
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral
Language:Python239 8 1217
alibaba-mmai-research/TAdaConv
[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
Language:Python229 7 2432
rohitgirdhar/ActionVLAD
ActionVLAD for video action classification (CVPR 2017)
Language:Python216 11 3861
whwu95/Text4Vis
【AAAI'2023 & IJCV】Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
Language:Python205 7 2315
sming256/OpenTAD
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Language:Python202 5 4014
antoyang/VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
Language:Jupyter Notebook183 3 2221
wangheda/youtube-8m
The 2nd place Solution to the Youtube-8M Video Understanding Challenge by Team Monkeytyping (based on tensorflow)
Language:Python176 13 558
antoyang/TubeDETR
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
Language:Python173 3 229

video-understanding

open-mmlab/mmaction2

jinwchoi/awesome-action-recognition

showlab/Awesome-Video-Diffusion

OpenGVLab/Ask-Anything

mit-han-lab/temporal-shift-module

open-mmlab/mmaction

PaddlePaddle/PaddleVideo

yjxiong/temporal-segment-networks

OpenGVLab/InternVideo

MCG-NJU/VideoMAE

yjxiong/tsn-pytorch

TheShadow29/awesome-grounding

PKU-YuanGroup/Chat-UniVi

yjxiong/action-detection

Vision-CAIR/MiniGPT4-video

OpenGVLab/VideoMAEv2

henghuiding/MeViS

yoosan/video-understanding-dataset

chihyaoma/Activity-Recognition-with-CNN-and-RNN

MCG-NJU/TDN

v-iashin/SpecVQGAN

movienet/movienet-tools

boheumd/MA-LMM

JunweiLiang/Multiverse

SoccerNet/sn-gamestate

whwu95/Cap4Video

NVlabs/STEP

rlleshi/phar

hustvl/TeViT

alibaba-mmai-research/TAdaConv

rohitgirdhar/ActionVLAD

whwu95/Text4Vis

sming256/OpenTAD

antoyang/VidChapters

wangheda/youtube-8m

antoyang/TubeDETR