RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D
Dataset and codebase for the ICCV2023 paper RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D.
- Referring expression comprehension & object tracking dataset on Ego4D
- 12,038 annotated clips of 41 hours total.
- 2FPS for annotation bboxes with two textual referring expressions for a single object.
- Objects can be out-of-frame of the first-person video (no-referred-object).
[paper][video][code][RefEgo dataset]
Dataset
Annotations can be downloaded from [RefEgo dataset]. See dataset/README.md for details.
Model
MDETR-based models and checkpoints and are here. We also add a notebook for trying our model!
Leaderboard
Coming soon.
Dataset License
RefEgo dataset annotations (bounding boxes and texts) are distributed under CC BY-SA 4.0. Please also follow Ego4D license for videos and images.
Cite
@InProceedings{Kurita_2023_ICCV,
author = {Kurita, Shuhei and Katsura, Naoki and Onami, Eri},
title = {RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {15214-15224}
}