Bangla Coreference Resolution

In this work, we tried to create a LitBank-Like Dataset for Bangla Language.

We have annotated some popular Bangla novels and connected the Pronouns with their respective Nouns. We used Brat Annotation tool to do this task. we have annotated nearly 6000 tokens from different chapters of the diverse novels. the average size of each chapter was almost 1500 words.

We distinguished the behavior of coreference occurance in this dataset. We identified all the possible pronouns of Bangla Language and then annotated the Nouns for each pronoun. After that, we generate the annotated file and after some preprocessing, we were able to generate the intended tokens for Noun - Pronoun pairs.