This repository contains all the paper critiques written for the Video Recognition class, COMP790 at the University of North Carolina at Chapel Hill. These critiques won't be official reviews of the papers, just some attempts in reading some important academic papers that could be essential for further research in the field.
First Paper: Slow Fast Network for Video Recognition.
Second Paper: Is Space-Time Attention All You Need for Video Understanding?
Third Paper: The One Where They Reconstructed 3D Humans and Environments in TV Shows.
Fourth Paper: VideoMAE: Masked Autoencoders are data-efficient Learners for Self-Supervised Video Pre-Training.
Fifth Paper: VATT: Transformers for Multimodal Self-Supervised Learning
from Raw Video, Audio and Text
Sixth Paper: VideoChat: Chat-Centric Video understanding
Seventh Paper: PHENAKI: VARIABLE LENGTH VIDEO GENERATION FROM OPEN DOMAIN TEXTUAL DESCRIPTIONS
Eight Paper: Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ninth Paper: GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving
Tenth Paper: Human-to-Robot Imitation in the Wild
Disclaimer: no actual critique wants to be moved against any paper, this is just a discussion space we created for the class for a better understanding of the works performed in the previous years.