/IDK-beyond-SQuAD2.0

Do we Know What We Don't Know? Studying Unanswerable Questions beyond SQuAD 2.0

Do we Know What We Don't Know?

Studying Unanswerable Questions beyond SQuAD 2.0

Repository for the paper:

      Do We Know What We Don't Know? Studying Unanswerable Questions beyond SQuAD 2.0
      Elior Sulem, Jamaal Hay and Dan Roth
      Findings of EMNLP 2021

1. Datasets

Existing Datasets Used in the paper:

Script for downloading GLUE_DATA: https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e

Link for MNLI (Matched) data alone: https://dl.fbaipublicfiles.com/glue/data/MNLI.zip

New Dataset (released in this repository):

  • ACE-whQA The corpus is in SQuAD 2.0 format so it can be used with the same code.
    • Has Answer: DATA/ACE-whQA/ACE-whQA-has-answer.json
    • Compet. IDK: DATA/ACE-whQA/ACE-whQA-IDK-competitive.json
    • Non-Compet. IDK: DATA/ACE-whQA/ACE-wkQA-non-competitive.json

License: The dataset is released under the Creative Commons Share-Alike 3.0 license

2. Pretrained Models

3. Commands for Training and Testing on SQuAD 2.0 and MNLI: