This repository contains the data and code for the baseline described in the following paper:
Generating Literal and Implied Subquestions to Fact-check Complex Claims
Jifan chen, Aniruddh Sriram, Eunsol Choi, Greg Durrett
arxiv preprint
@article{chen-etal-2022-generating,
title={Generating Literal and Implied Subquestions to Fact-check Complex Claims},
author={Chen, Jifan and Sriram, Aniruddh and Choi, Eunsol and Durrett, Greg},
journal={arxiv preprint},
year={2022}
}
git clone https://github.com/jifan-chen/subquestions-for-fact-checking.git
Install the dependencies by running
pip install -r requirements.txt
Since some of the URLs are already unavailable, email us for the full dataset
To download the dataset, simply run bash scripts/download_data.sh
The data files are located under ./ClaimDecomp
. Note that the downloaded data doesn't contains the justification paragraph and the full article written by the fact-checkers. To get the two fields, you will need to scrape from Politifact using reconstruct_dataset.py
which is by default included in download_data.sh
. You can also find our annotated liter/implied questions via this spreadsheet.
train.jsonl
contains 800 unique claims paired with the decomposed questions.dev.jsonl
contains 200 unique claims paired with the decomposed questions.test.jsonl
contains 200 unique claims paired with the decomposed questions.
The data files are formatted as jsonlines. Here is a single example:
{
'example_id': '-7643898299150913613',
'claim': 'With voting by mail, you get thousands and thousands of people sitting in somebody's living room, signing ballots all over the place.',
'label': 'false',
'person': 'Donald Trump',
'venue': 'stated on April 7, 2020 in a press briefing:',
'url': 'https://www.politifact.com/factchecks/2020/apr/09/donald-trump/donald-trumps-dubious-claim-thousands-are-conspiri/',
'justification': 'Trump said that with voting by mail, "you get thousands and thousands of people sitting in somebody's living room, signing ballots all over the place. "Voting fraud in general is considered to be rare, although voting experts agree that the risks are greater for mail balloting than for in-person voting. Still, Trump didn't produce any evidence for the "thousands and thousands" claim, and voting experts said his assertion doesn't square with what is known about the actual cases of voting fraud in the recent past.\nWe rate the statement False.'
'annotations':[
{"questions": ["Is voting fraud widespread in the US?", "Is there a greater risk of voting fraud with mail-in ballots?", "Is there evidence of thousands of people committing mail-in voting fraud?"],
"answers": ["no", "yes", "no"],
"statements": ["Voting fraud is widespread in the US.", "There is a greater risk of voting fraud with mail-in ballots.", "There is evidence of thousands of people committing mail-in voting fraud."]
"statements-negate": ["Voting fraud is not widespread in the US.", "There is no greater risk of voting fraud with mail-in ballots.", "There is no evidence of thousands of people committing mail-in voting fraud."]
}
...
]
}
Field | type | Description |
---|---|---|
example_id |
string | Example ID |
claim |
string | Claim |
label |
string | Label: pants-fire, false, barely-true, half-true, mostly-true, true |
person |
string | Person who made the claim |
venue |
string | Date and venue of the claim |
url |
string | Politifact url of the claim |
justification |
List[string] | Justification paragraph writen by the fact-checkers |
full_article |
List[string] | Full verification article writen by the fact-checkers |
annotations |
List[dict] | Annotation of our decomposed questions |
Each annotation
is formatted as follows:
Field | type | Description |
---|---|---|
questions |
List[string] | Yes-no questions related to checking the veracity of the claim |
answers |
List[string] | Answer to the question: yes/no/unknown |
question_source |
List[string] | Question source: claim or justification |
statements |
List[string] | Statements converted from the yes-no questions |
statements_negate |
List[string] | Negated statements |
To run the three NLI models -- NQ-NLI
, Doc-NLI
, MNLI
trained in our paper, simply run bash scripts/run_nli_models.sh
. You will need to install allennlp==2.7.0
and torch==1.9.0
. Check scripts/run_nli_models.sh
for details about where the models are downloaded and how to switch models.
Coming soon ...
Please contact at jfchen@cs.utexas.edu
if you have any questions.