/video-understanding-dataset

A collection of recent video understanding datasets, under construction!

Video-understanding-dataset

Help us to complete these lists, feel free to pull a request.

Video Classification

Dataset Paper Website Category #Examples #Classes Duration Organizer SOTA performance
UCF101 PDF Link human action 13,320 101 <10s UCF 98% (DeepMind I3D)
HMDB51 PDF Link human action 6,766 51 <10s SERRE LAB, Brown -
ActivityNet v1.3 PDF Link human activities ~20,000 200 - ActivityNet 8.83% err (iBUG)
Charades PDF Link daily human activities 9,848 157 - AI2 -
Kinetics PDF Link human action ~300,000 400 10s DeepMind -
Sports-1M PDF Link sports ~1 million 478 5m36s Google & Stanford -
YouTube-8M PDF Link visual contents ~7 million 4716 120-500s Google Cloud 85% GAP (WILLOW)
FCVID PDF Link visual contents 91,223 239 100s+ Fudan-Columbia -
Something-Something PDF Link action with objects 108,499 174 ~4s TwentyBN -
Moments in Time PDF Link action or activity ~1 million 339 3s MIT-IBM Watson -

Temporal Action Detection

Dataset Paper Website #Examples Organizer SOTA performance
THUMOS2014 PFD Link 9.682 UCF -
ActivityNet(v1.3) PFD Link ~20,000 ActivityNet 0.344(SJTU & Columbia )

Video Captioning

Dataset Paper Website Context #Examples Organizer SOTA performance
MPII-MD PDF Link movie 68,337 clips with 68,375 sentences MPII -
MSR-VTT PDF Link 20 categories 10,000 clips wth 200,000 sentences MSR -
Charades PDF Link human activity 9,848 clips wth 27,847 sentences AI2 -
Densevid PDF Link event 20k clips and 100k sentences Stanford, ActivityNet -

Video Question Answering

Dataset Paper Website Task #Examples Organizer SOTA performance
MovieQA PDF Link question-answering in movies 408 movies & 14944 QAs UToronto -
MarioQA PDF Link reasoning events in game videos 187,757 examples with 92,874 QAs POSTECH -