Awesome Egocentric Vision

A curated list of egocentric vision resources.

Egocentric (first-person) vision is a sub-field of computer vision that analyses image/video data obtained using a wearable camera simulating a person's visual field.

Papers
Datasets

Papers

Action/Activity Recognition

Integrating Human Gaze Into Attention for Egocentric Activity Recognition - Kyle Min, Jason J. Corso, WACV 2021.
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition - Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima, ICCV 2019. [code] [project page]
LSTA: Long Short-Term Attention for Egocentric Action Recognition - Sudhakaran, Swathikiran and Escalera, Sergio and Lanz, Oswald, CVPR 2019. [code]
Egocentric Activity Recognition on a Budget - Possas, Rafael and Caceres, Sheila Pinto and Ramos, Fabio, CVPR 2018. [demo]
From Lifestyle VLOGs to Everyday Interaction - David F. Fouhey and Weicheng Kuo and Alexei A. Efros and Jitendra Malik, CVPR 2018. [project page]
Actor and Observer: Joint Modeling of First and Third-Person Videos - Gunnar A. Sigurdsson and Abhinav Gupta and Cordelia Schmid and Ali Farhadi and Karteek Alahari, CVPR 2018. [code]
In the eye of beholder: Joint learning of gaze and actions in first person video - Li, Y., Liu, M., & Rehg, J. M., ECCV 2018.
Mitigating Bystander Privacy Concerns in Egocentric Activity Recognition with Deep Learning and Intentional Image Degradation - Dimiccoli, M., Marín, J., & Thomaz, E., Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2018.
Privacy-Preserving Human Activity Recognition from Extreme Low Resolution - Ryoo, M. S., Rothrock, B., Fleming, C., & Yang, H. J., AAAI 2017.
Jointly Recognizing Object Fluents and Tasks in Egocentric Videos - Liu, Yang and Wei, Ping and Zhu, Song-Chun, ICCV 2017.
Trajectory Aligned Features For First Person Action Recognition - S. Singh, C. Arora, and C.V. Jawahar, Pattern Recognition 2017.
First Person Action Recognition Using Deep Learned Descriptors - S. Singh, C. Arora, and C.V. Jawahar, CVPR 2016. [project page] [code]
Understanding Hand-Object Manipulation with Grasp Types and Object Attributes - Minjie Cai and Kris M. Kitani and Yoichi Sato, Robotics: Science and Systems 2016.
Delving into egocentric actions - Li, Y., Ye, Z., & Rehg, J. M., CVPR 2015.
Pooled Motion Features for First-Person Videos - Michael S. Ryoo, Brandon Rothrock and Larry H. Matthies, CVPR 2015.
Generating Notifications for Missing Actions: Don't forget to turn the lights off! - Soran, Bilge, Ali Farhadi, and Linda Shapiro, ICCV 2015.
First-Person Activity Recognition: What Are They Doing to Me? - M. S. Ryoo and L. Matthies, CVPR 2013.
Detecting activities of daily living in first-person camera views - Pirsiavash, H., & Ramanan, D., CVPR 2012.
Learning to recognize daily actions using gaze - Fathi, A., Li, Y., & Rehg, J. M, ECCV 2012.
Learning to recognize objects in egocentric activities - Fathi, A., Ren, X., & Rehg, J. M., CVPR 2011.
Fast unsupervised ego-action learning for first-person sports videos - Kitani, K. M., Okabe, T., Sato, Y., & Sugimoto, A., CVPR 2011 [project page]
Temporal segmentation and activity classification from first-person sensing - Spriggs, Ekaterina H., Fernando De La Torre, and Martial Hebert, Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009.
Wearable hand activity recognition for event summarization - Mayol, W. W., & Murray, D. W., IEEE International Symposium on Wearable Computers, 2005.

Object/Hand Recognition

Whose Hand Is This? Person Identification From Egocentric Hand Gestures - Satoshi Tsutsui, Yanwei Fu, David J. Crandall, WACV 2021.
Generalizing Hand Segmentation in Egocentric Videos with Uncertainty-Guided Model Adaptation - Minjie Cai and Feng Lu and Yoichi Sato, CVPR 2020. [code]
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions - Tekin, Bugra and Bogo, Federica and Pollefeys, Marc, CVPR 2019. [video]
Analysis of Hand Segmentation in the Wild - Aisha Urooj, Ali Borj, CVPR 2018.
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations - Garcia-Hernando, Guillermo and Yuan, Shanxin and Baek, Seungryul and Kim, Tae-Kyun, CVPR 2018. [project page] [code]
Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules - Cao, Congqi and Zhang, Yifan and Wu, Yi and Lu, Hanqing and Cheng, Jian, ICCV 2017.
Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions - Bambach, S., Lee, S., Crandall, D. J., & Yu, C., ICCV 2015.
Detecting Snap Points in Egocentric Video with a Web Photo Prior - Bo Xiong and Kristen Grauman, ECCV 2014. [project page] [code]
Pixel-level hand detection in ego-centric videos - Li, Cheng, and Kris M. Kitani. CVPR 2013. [video] [code]
Context-based vision system for place and object recognition - Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A., ICCV 2003. [project page]

Action/Gaze Anticipation

Learning to Anticipate Egocentric Actions by Imagination - Yu Wu, Linchao Zhu, Xiaohan Wang, Yi Yang, Fei Wu, TIP 2021.
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video - Miao Liu, Siyu Tang, Yin Li, James M. Rehg, ECCV 2020. [project page]
How Can I See My Future? FvTraj: Using First-person View for Pedestrian Trajectory Prediction - Huikun Bi, Ruisi Zhang, Tianlu Mao, Zhigang Deng, Zhaoqi Wang, ECCV 2020. [presentation video] [summary video]
Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior - Makansi, Osama and Cicek, Ozgun and Buchicchio, Kevin and Brox, Thomas, CVPR 2020. [demo] [code] [project page]
EGO-TOPO: Environment Affordances from Egocentric Video - Nagarajan, Tushar and Li, Yanghao and Feichtenhofer, Christoph and Grauman, Kristen, CVPR 2020. [project page] [demo]
What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention - Antonino Furnari, Giovanni Maria Farinella, ICCV 2019 [code] [demo]
Digging Deeper into Egocentric Gaze Prediction - Hamed R. Tavakoli and Esa Rahtu and Juho Kannala and Ali Borji, WACV 2019.
Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition - Huang, Y., Cai, M., Li, Z., & Sato, Y., ECCV 2018 [code]
First-Person Activity Forecasting with Online Inverse Reinforcement Learning - Nicholas Rhinehart, Kris M. Kitani, ICCV 2017. [project page] [video]
Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks - Zhang, M., Teck Ma, K., Hwee Lim, J., Zhao, Q., & Feng, J., CVPR 2017. [code]
Going deeper into first-person activity recognition - Ma, M., Fan, H., & Kitani, K. M., CVPR 2016.
Learning to predict gaze in egocentric video - Li, Yin, Alireza Fathi, and James M. Rehg, ICCV 2013.

Localization

Hand-Priming in Object Localization for Assistive Egocentric Vision - Lee, Kyungjun and Shrivastava, Abhinav and Kacorri, Hernisa, WACV 2020.
Egocentric Shopping Cart Localization - E. Spera, A. Furnari, S. Battiato, G. M. Farinella, ICPR 2018.
Recognizing personal locations from egocentric videos - Furnari, A., Farinella, G. M., & Battiato, S., IEEE Transactions on Human-Machine Systems 2017.
Personal-Location-Based Temporal Segmentation of Egocentric Video for Lifelogging Applications - A. Furnari, G. M. Farinella, S. Battiato, Journal of Visual Communication and Image Representation 2017 [demo] [project page]
Egocentric Future Localization - Park, Hyun Soo and Hwang, Jyh-Jing and Niu, Yedong and Shi, Jianbo, CVPR 2016. [demo]
Real-time localization and mapping with wearable active vision - Davison, A. J., Mayol, W. W., & Murray, D. W., The Second IEEE and ACM International Symposium 2003.

Clustering

Sr-clustering: Semantic regularized clustering for egocentric photo streams segmentation - Dimiccoli, M., Bolaños, M., Talavera, E., Aghaei, M., Nikolov, S. G., & Radeva, P., Computer Vision and Image Understanding 2017.
Summarization and Classification of Wearable Camera Streams by Learning the Distributions over Deep Features of Out-of-Sample Image Sequences - Perina, A., Mohammadi, S., Jojic, N., & Murino, V., ICCV 2017.

Video summarization

Story-Driven Summarization for Egocentric Video - Zheng Lu and Kristen Grauman, CVPR 2013 [project page]
Discovering Important People and Objects for Egocentric Video Summarization - Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman, CVPR 2012. [project page]

Social Interactions

EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset - Curtis G. Northcutt and Shengxin Zha and Steven Lovegrove and Richard Newcombe, PAMI 2020.
Deep Dual Relation Modeling for Egocentric Interaction Recognition - Li, Haoxin and Cai, Yijun and Zheng, Wei-Shi, CVPR 2019.
Recognizing Micro-Actions and Reactions from Paired Egocentric Videos - Yonetani, Ryo and Kitani, Kris M. and Sato, Yoichi, CVPR 2016.
Social interactions: A first-person perspective - Fathi, A., Hodgins, J. K., & Rehg, J. M., CVPR 2012.

Pose Estimation

You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions - Ng, Evonne and Xiang, Donglai and Joo, Hanbyul and Grauman, Kristen, CVPR 2020. [demo] [project page] [dataset] [code]
Ego-Pose Estimation and Forecasting as Real-Time PD Control - Ye Yuan and Kris Kitani, ICCV 2019. [code] [project page] [demo]
xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera - Tome, Denis and Peluse, Patrick and Agapito, Lourdes and Badino, Hernan, ICCV 2019. [demo] [dataset]

Human Object Interaction

The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain - Francesco Ragusa and Antonino Furnari and Salvatore Livatino and Giovanni Maria Farinella, WACV 2021. [project page]
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video - Miao Liu, Siyu Tang, Yin Li, James M. Rehg, ECCV 2020. [project page]

Multiple Egocentric Tasks

Ego4D: Around the World in 3,000 Hours of Egocentric Video - Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Christian Fuegen, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik, arXiv. [Github] [project page] [video]

Miscellaneous

Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos - Li, Yanghao and Nagarajan, Tushar and Xiong, Bo and Grauman, Kristen, CVPR 2021. [code]
Automatic Calibration of the Fisheye Camera for Egocentric 3D Human Pose Estimation From a Single Image - Yahui Zhang, Shaodi You, Theo Gevers, WACV 2021.
Is Sharing of Egocentric Video Giving Away Your Biometric Signature? - Daksh Thapar, Chetan Arora, Aditya Nigam. ECCV 2020. [project page]
EGO-SLAM: A Robust Monocular SLAM for Egocentric Videos - Suvam Patra and Kartikeya Gupta and Faran Ahmad and Chetan Arora and Subhashis Banerjee, WACV 2019. [code]
Egocentric Basketball Motion Planning from a Single First-Person Image - Bertasius, Gedas and Chan, Aaron and Shi, Jianbo, CVPR 2018. [demo]
Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video - Moltisanti, Davide and Wray, Michael and Mayol-Cuevas, Walterio and Damen, Dima, ICCV 2017.
Jointly Learning Energy Expenditures and Activities using Egocentric Multimodal Signals - Nakamura, Katsuyuki and Yeung, Serena and Alahi, Alexandre and Fei-Fei, Li, CVPR 2017.
Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video - Jiang, Hao and Grauman, Kristen, CVPR 2017.
Toward storytelling from visual lifelogging: An overview - Bolanos, M., Dimiccoli, M., & Radeva, P., IEEE Transactions on Human-Machine Systems 2017.
Automated capture and delivery of assistive task guidance with an eyewear computer: the GlaciAR system - Leelasawassuk, Teesid, Dima Damen, and Walterio Mayol-Cuevas, Augmented Human International Conference, ACM 2017.
Walk and Learn: Facial Attribute Representation Learning from Egocentric Video and Contextual Data - Wang, Jing and Cheng, Yu and Feris, Rogerio Schmidt, CVPR 2016. [demo]
Compact CNN for Indexing Egocentric Videos - Y. Poleg, E. Phrat, S. Peleg, and C. Arora, WACV, 2016.
Multi-face tracking by extended bag-of-tracklets in egocentric photo-streams - Aghaei, M., Dimiccoli, M., & Radeva, P., Computer Vision and Image Understanding 2016.
Detecting engagement in egocentric video - Su, Y.C., & Grauman, K., ECCV 2016.
EgoSampling: Fast-Forward and Stereo for Egocentric Videos - Poleg, Yair and Halperin, Tavi and Arora, Chetan and Peleg, Shmuel, CVPR 2015.
Ego-Surfing First Person Videos - Yonetani, Ryo and Kitani, Kris M. and Sato, Yoichi, CVPR 2015.
You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video - Damen, D., Leelasawassuk, T., Haines, O., Calway, A., & Mayol-Cuevas, W. W., BMVC 2014 [project page]
Temporal segmentation of egocentric videos - Poleg, Y., Arora, C., & Peleg, S., CVPR 2014.

Datasets

Ego4D - 3,025 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries.
EgoCom - A natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives.
TREK-100 - Object tracking in first person vision.
MECCANO - 20 subject assembling a toy motorbike.
EPIC-Kitchens 2020 - Subjects performing unscripted actions in their native environments.
EPIC-Tent - 29 participants assembling a tent while wearing two head-mounted cameras. [paper]
EGO-CH - 70 subjects visiting two cultural sites in Sicily, Italy.
EPIC-Kitchens 2018 - 32 subjects performing unscripted actions in their native environments.
Charade-Ego - Paired first-third person videos.
EGTEA Gaze+ - 32 subjects, 86 cooking sessions, 28 hours.
ADL - 20 subjects performing daily activities in their native environments.
CMU kitchen - Multimodal, 18 subjects cooking 5 different recipes: brownies, eggs, pizza, salad, sandwich.
EgoSeg - Long term actions (walking, running, driving, etc.)
First-Person Social Interactions - 8 subjects at disneyworld.
UEC Dataset - Two choreographed datasets with different egoactions (walk, jump, climb, etc.) + 6 YouTube sports videos.
JPL - Interaction with a robot.
FPPA - Five subjects performing 5 daily actions.
UT Egocentric - 3-5 hours long videos capturing a person's day.
VINST/ Visual Diaries - 31 videos capturing the visual experience of a subject walking from metro station to work.
Bristol Egocentric Object Interaction (BEOID) - 8 subjects, six locations. Interaction with objects and environment.
Object Search Dataset - 57 sequences of 55 subjects on search and retrieval tasks.
UNICT-VEDI - Different subjects visiting a museum.
UNICT-VEDI-POI - Different subjects visiting a museum.
Simulated Egocentric Navigations - Simulated navigations of a virtual agent within a large building.
EgoCart - Egocentric images collected by a shopping cart in a retail store.
Unsupervised Segmentation of Daily Living Activities - Egocentric videos of daily activities.
Visual Market Basket Analysis - Egocentric images collected by a shopping cart in a retail store.
Location Based Segmentation of Egocentric Videos - Egocentric videos of daily activities.
Recognition of Personal Locations from Egocentric Videos - Egocentric videos clips of daily.
EgoGesture - 2k videos from 50 subjects performing 83 gestures.
EgoHands - 48 videos of interactions between two people.
DoMSEV - 80 hours/different activities.
DR(eye)VE - 74 videos of people driving.
THU-READ - 8 subjects performing 40 actions with a head-mounted RGBD camera.
EgoDexter - 4 sequences with 4 actors (2 female), and varying interactions with various objects and and cluttered background. [paper]
First-Person Hand Action (FPHA) - 3D hand-object interaction. Includes 1175 videos belonging to 45 different activity categories performed by 6 actors. [paper]
UTokyo Paired Ego-Video (PEV) - 1,226 pairs of first-person clips extracted from the ones recorded synchronously during dyadic conversations.
UTokyo Ego-Surf - Contains 8 diverse groups of first-person videos recorded synchronously during face-to-face conversations.
TEgO: Teachable Egocentric Objects Dataset - Contains egocentric images of 19 distinct objects taken by two people for training a teachable object recognizer.
Multimodal Focused Interaction Dataset - Contains 377 minutes of continuous multimodal recording captured during 19 sessions, with 17 conversational partners in 18 different indoor/outdoor locations.

Contribute

This is a work in progress. Contributions welcome! Read the contribution guidelines first.

ahamza848/awesome-egocentric-vision