This is the code for the paper :
Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen, Who are you referring to? Coreference resolution in image narrations, ICCV 23.
The webpage for this project is available here, with a link to the paper and CIN dataset.
The code and dataset are available for research purpose.
-
Download the CIN dataset from here
-
The folder contains the following files :
- testval_annotations.json: the json file has the following structure
{
"image": "image_id (flickr30k images)",
"captions": "narration",
"split": "test/val",
"img_width": "int",
"img_height": "int",
"query": "list of phrases",
"query_start_end": "list of start and end index for each phrase/query",
"cluster": "list of cluster id for each phrase/query",
"target_bboxes": "list of bounding boxes for each phrase/query (x,y,w,h)"
}
- val_image_ids.json: image ids for the flickr30k validation set
- test_image_ids.json: image ids for the flickr30k test set
- gold.zip: gold/ground-truth coreference chains for test/val in the CoNLL format
Please contact the first author for any queries or concerns at goel.arushi@gmail.com.