RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D

Dataset and codebase for the ICCV2023 paper RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D.

Referring expression comprehension & object tracking dataset on Ego4D
12,038 annotated clips of 41 hours total.
2FPS for annotation bboxes with two textual referring expressions for a single object.
Objects can be out-of-frame of the first-person video (no-referred-object).

[paper][video][code][RefEgo dataset]

Dataset

Annotations can be downloaded from [RefEgo dataset]. See dataset/README.md for details.

Model

MDETR-based models and checkpoints and are here. We also add a notebook for trying our model!

Leaderboard

Coming soon.

Dataset License

RefEgo dataset annotations (bounding boxes and texts) are distributed under CC BY-SA 4.0. Please also follow Ego4D license for videos and images.

Cite

@InProceedings{Kurita_2023_ICCV,
    author    = {Kurita, Shuhei and Katsura, Naoki and Onami, Eri},
    title     = {RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {15214-15224}
}

Acknowledgement

Ego4D

shuheikurita/RefEgo