Coreference Reasoning in Machine Reading Comprehension

This repository contains the data and code to reproduce the results of our paper: https://arxiv.org/abs/2012.15573

Please use the following citation:

@inproceedings{wu-etal-2021-coreference-reasoning,
    title = "Coreference Reasoning in Machine Reading Comprehension",
    author = "Mingzhu Wu and Nafise Sadat Moosavi and Dan Roth and Iryna Gurevych",
    booktitle = "Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021) ",
    month = aug,
    year = "2021",
    address = "online",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2012.15573",
    pages = "to-appear",
}

Abstract: The ability to reason about multiple references to a given entity is essential for natural language understanding and has been long studied in NLP. In recent years, as the format of Question Answering (QA) became a standard for machine reading comprehension (MRC), there have been data collection efforts, e.g., Dasigi et al. (2019), that attempt to evaluate the ability of MRC models to reason about coreference. However, as we show, coreference reasoning in MRC is a greater challenge than was earlier thought; MRC datasets do not reflect the natural distribution and, consequently, the challenges of coreference reasoning. Specifically, success on these datasets does not reflect a model's proficiency in coreference reasoning. We propose a methodology for creating reading comprehension datasets that better reflect the challenges of coreference reasoning and use it to show that state-of-the-art models still struggle with these phenomena. Furthermore, we develop an effective way to use naturally occurring coreference phenomena from annotated coreference resolution datasets when training MRC models. This allows us to show an improvement in the coreference reasoning abilities of state-of-the-art models across various MRC datasets. We will release all the code and the resulting dataset at this https URL.

Contact person: Mingzhu Wu, wumingzhu1989@gmail.com

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Overview

This repository consists of three parts:

  1. ./tase: provides the code for training and evaluating TASE models. We reuses all the code from tag-based-multi-span-extraction with minor changes for multi-datasets training.

  2. ./roberta: provides the code for training and evaluating RoBERTa large model adapted from the Hugginface pytorch-transformer implementation of RoBERTa QA.

  3. ./data: contains all the datasets we use for training and evaluation.