/subquestions-for-fact-checking

Code and dataset for the paper: Generating Literal and Implied Subquestions to Fact-check Complex Claims

Primary LanguagePython

Generating Literal and Implied Subquestions to Fact-check Complex Claims

This repository contains the data and code for the baseline described in the following paper:

Generating Literal and Implied Subquestions to Fact-check Complex Claims
Jifan chen, Aniruddh Sriram, Eunsol Choi, Greg Durrett
arxiv preprint

@article{chen-etal-2022-generating,
  title={Generating Literal and Implied Subquestions to Fact-check Complex Claims},
  author={Chen, Jifan and Sriram, Aniruddh and Choi, Eunsol and Durrett, Greg},
  journal={arxiv preprint},
  year={2022}
}

Get Started

git clone https://github.com/jifan-chen/subquestions-for-fact-checking.git

Install the dependencies by running pip install -r requirements.txt

Datasets

Update

Since some of the URLs are already unavailable, email us for the full dataset


To download the dataset, simply run bash scripts/download_data.sh The data files are located under ./ClaimDecomp. Note that the downloaded data doesn't contains the justification paragraph and the full article written by the fact-checkers. To get the two fields, you will need to scrape from Politifact using reconstruct_dataset.py which is by default included in download_data.sh. You can also find our annotated liter/implied questions via this spreadsheet.

  • train.jsonl contains 800 unique claims paired with the decomposed questions.
  • dev.jsonl contains 200 unique claims paired with the decomposed questions.
  • test.jsonl contains 200 unique claims paired with the decomposed questions.

The data files are formatted as jsonlines. Here is a single example:

{
    'example_id': '-7643898299150913613',
    'claim': 'With voting by mail, you get thousands and thousands of people sitting in somebody's living room, signing ballots all over the place.',
    'label': 'false',
    'person': 'Donald Trump',
    'venue': 'stated on April 7, 2020 in a press briefing:',
    'url': 'https://www.politifact.com/factchecks/2020/apr/09/donald-trump/donald-trumps-dubious-claim-thousands-are-conspiri/',
    'justification': 'Trump said that with voting by mail, "you get thousands and thousands of people sitting in somebody's living room, signing ballots all over the place. "Voting fraud in general is considered to be rare, although voting experts agree that the risks are greater for mail balloting than for in-person voting. Still, Trump didn't produce any evidence for the "thousands and thousands" claim, and voting experts said his assertion doesn't square with what is known about the actual cases of voting fraud in the recent past.\nWe rate the statement False.'
    'annotations':[
        {"questions": ["Is voting fraud widespread in the US?", "Is there a greater risk of voting fraud with mail-in ballots?", "Is there evidence of thousands of people committing mail-in voting fraud?"],
         "answers": ["no", "yes", "no"],
         "statements": ["Voting fraud is widespread in the US.", "There is a greater risk of voting fraud with mail-in ballots.", "There is evidence of thousands of people committing mail-in voting fraud."]
         "statements-negate": ["Voting fraud is not widespread in the US.", "There is no greater risk of voting fraud with mail-in ballots.", "There is no evidence of thousands of people committing mail-in voting fraud."]
        }
    ...
    ]
}
Field type Description
example_id string Example ID
claim string Claim
label string Label: pants-fire, false, barely-true, half-true, mostly-true, true
person string Person who made the claim
venue string Date and venue of the claim
url string Politifact url of the claim
justification List[string] Justification paragraph writen by the fact-checkers
full_article List[string] Full verification article writen by the fact-checkers
annotations List[dict] Annotation of our decomposed questions

Each annotation is formatted as follows:

Field type Description
questions List[string] Yes-no questions related to checking the veracity of the claim
answers List[string] Answer to the question: yes/no/unknown
question_source List[string] Question source: claim or justification
statements List[string] Statements converted from the yes-no questions
statements_negate List[string] Negated statements

Running NLI models

To run the three NLI models -- NQ-NLI, Doc-NLI, MNLI trained in our paper, simply run bash scripts/run_nli_models.sh. You will need to install allennlp==2.7.0 and torch==1.9.0. Check scripts/run_nli_models.sh for details about where the models are downloaded and how to switch models.

Question Generator

Coming soon ...

Contact

Please contact at jfchen@cs.utexas.edu if you have any questions.