VICO-UoE/CIN

Who are you referring to? Coreference resolution in image narrations

Introduction

This is the code for the paper :

Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen, Who are you referring to? Coreference resolution in image narrations, ICCV 23.

The webpage for this project is available here, with a link to the paper and CIN dataset.

License

The code and dataset are available for research purpose.

CIN Dataset

Download the CIN dataset from here
The folder contains the following files :

testval_annotations.json: the json file has the following structure

{
 "image": "image_id (flickr30k images)",
 "captions": "narration",
 "split": "test/val",
 "img_width": "int",
 "img_height": "int",
 "query": "list of phrases",
 "query_start_end": "list of start and end index for each phrase/query",
 "cluster": "list of cluster id for each phrase/query",
 "target_bboxes": "list of bounding boxes for each phrase/query (x,y,w,h)"
}

val_image_ids.json: image ids for the flickr30k validation set
test_image_ids.json: image ids for the flickr30k test set
gold.zip: gold/ground-truth coreference chains for test/val in the CoNLL format

Contact

Please contact the first author for any queries or concerns at goel.arushi@gmail.com.