kentonl/e2e-coref

What does the cluster in the json file mean?

MENGHAH opened this issue · 3 comments

I have got the jsonlines files through the setup_training.sh. But I can't understand the meaning of clusters in the json files. Can you explain it to me?

I have the same question

Hello,

I was having the same question. It took some time to understand that.

Its the the coreference cluster present in the actual gold files. The key "clusters" in jsonlines file contains list of clusters present in the original file and each cluster contains list of mentions. The mentions are represented by its start and end index from the original file.

For example, in test.jsonlines - first entry is for file "bc/cctv/00/cctv_0005_0". Clusters are - [[[57, 59], [25, 27], [42, 44]], [[19, 23], [16, 16]], [[83, 83], [82, 82]]].

Here, mentions [19, 23], [16, 16] are in same cluster. Also there are other two clusters as [57, 59], [25, 27], [42, 44] and [83, 83], [82, 82]. Mention -

'the', 'Chinese', 'securities', 'regulatory', 'department'

is represented as its start and end index [19,23] . And so on for other mentions.

I hope this helps for other people as well.

thanks,
Onkar

@MENGHAH Could you please send this document for me?train.jsonlines,test.jsonlines,dev.jsonlines