fancy-video-classification

Video Classifiication Paper List

Papers

Recommendations!

arXiv

  • Cao, Haoyuan, Shining Yu, and Jiashi Feng.
    "Compressed Video Action Recognition with Refined Motion Vector." arXiv(2019).[PDF]

CVPR2020

  • TEA:Yan Li, Bin Ji, et al.
    "TEA: Temporal Excitation and Aggregation for Action Recognition" CVPR(2020)[PDF]
  • TPN:Ceyuan Yang, Yinghao Xu, et al.
    "TPN: Temporal Pyramid Network for Action Recognition" CVPR(2020)[PDF][Code]

CVPR2019

  • Dmc-net:Shou, Zheng, et al.
    "Dmc-net: Generating discriminative motion cues for fast compressed video action recognition." CVPR(2019).[PDF][Code]

ICCV2019

  • SlowFast: Feichtenhofer C, Fan H, Malik J, et al.
    "Slowfast Networks for Video Recognition",ICCV(2019 oral).[PDF][Code]
  • TSM: Chuang Gan, Song Han,Ji Lin
    "Temporal Shift Module for Efficient Video Understanding",ICCV(2019).[PDF][Code]
  • STM: Jiang, Boyuan, et al.
    "STM: SpatioTemporal and motion encoding for action recognition." ICCV(2019).[PDF]

NIPS2019

  • bLVNet-TAM: Quanfu Fan, Chun-Fu (Richard) Chen, Hilde Kuehne, Marco Pistoia, David Cox
    "More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation".NIPS(2019)[PDF][Code]

CVPR 2018

  • R(2+1)D: Tran, Du, et al.
    "A closer look at spatiotemporal convolutions for action recognition." CVPR(2018).[PDF][Code]
  • CoViAR:Wu, Chao-Yuan, et al.
    "Compressed video action recognition." CVPR(2018).[PDF][Code]
  • Non-local:Wang, Xiaolong, et al.
    "Non-local neural networks." CVPR(2018).[PDF][Code]

NIPS 2018

  • TrajectoryNet: Zhao, Yue, Yuanjun Xiong, and Dahua Lin.
    "Trajectory convolution for action recognition." NIPS(2018)[PDF]

CVPR 2017

  • I3D: Carreira Joao and Andrew Zisserman.
    "Quo vadis, action recognition? a new model and the kinetics dataset" CVPR(2017).[PDF][Code]

ECCV2016

  • TSN:Wang, Limin, et al.
    "Temporal segment networks: Towards good practices for deep action recognition." ECCV(2016)[PDF][Code]

NIPS2014

  • Two Stream: Simonyan, Karen, and Andrew Zisserman.
    "Two-stream convolutional networks for action recognition in videos." NIPS(2014).[PDF][Code]

ICCV2013

  • IDT:Wang, Heng, and Cordelia Schmid.
    "Action recognition with improved trajectories." ICCV(2013).[PDF]

Competitions

Datasets

  • UCF101
    13320 videos; average time ~10s; 101 human action categories,each class has 25 groups,videos in same group share some common features; datasets are not realistic and are staged by actors.
  • HMDB51
    6849 videos; average time ~5s; 51 human action categories, each containing a minimum of 101 videos; datasets are most from movies clips, and a small proportion from other public datasets and web videos.
  • Kinetics(due to the missing videos in kinetics source csv, the 'nolocal net' reseachers offer a pre-downloaded version of kinetics-400,here it's the relevent issue)
    650000 videos; average time ~10s; 700/600/400 human categories, each action class has at least 600 video clips; datasets are most from youtube videos.
  • Something-something v2
    220847 videos; average time 2~6s; 174 human basic action categories; datasets focus the human fine-grined actions,such as "Putting something on a surface".
  • Charades
    average ~30s per video, long-term video dataset.
  • Moments in Time
    about one millon videos; average time ~3s, involving people, animals, objects or natural phenomena, that capture the gist of a dynamic scene.

Benchmarks

Distinguished Researchers & Teams