/LIAR-PLUS

Creative Commons Zero v1.0 UniversalCC0-1.0

LIAR-PLUS

The extended LIAR dataset for fact-checking and fake news detection released in our paper: Where is Your Evidence: Improving Fact-Checking by Justification Modeling. Tariq Alhindi, Savvas Petridis and Smaranda Muresan. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER) Brussels, Belgium November 1st, 2018.

This dataset has evidence sentences extracted automatically from the full-text verdict report written by journalists in Politifact. Our objective is to provide a benchmark for evidence retrieval and show empirically that including evidence information in any automatic fake news detection method (regardless of features or classifier) always results in superior performance to any method lacking such information.

Below is the description of the TSV file taken as is from the original LIAR dataset, which was published in this paper. We added a new column at the end that has the extracted justification.

  • Column 1: the ID of the statement ([ID].json).
  • Column 2: the label.
  • Column 3: the statement.
  • Column 4: the subject(s).
  • Column 5: the speaker.
  • Column 6: the speaker's job title.
  • Column 7: the state info.
  • Column 8: the party affiliation.
  • Columns 9-13: the total credit history count, including the current statement.
    • 9: barely true counts.
    • 10: false counts.
    • 11: half true counts.
    • 12: mostly true counts.
    • 13: pants on fire counts.
  • Column 14: the context (venue / location of the speech or statement).
  • Column 15: the extracted justification

Our justification extraction method is done as follows:

  • Get all sentences in the 'Our Ruling' section of the report if it exists or get the last five sentences.
  • Remove any sentence that have the verdict and any verdict-related words. Verdict-related words are provided in the forbidden words file.

Please Note:
The dataset in the current commit is the second version which was updated after publishing the paper. We increased the list of forbidden words in the second version after realizing that we have missed a few in v1. To find the results of our experiments on v2 of the dataset, please refer to the poster. To find the results on v1 of the dataset, please refer to the paper. V1 of the dataset can be found in this commit.

Note that we do not provide the full-text verdict report in this current version of the dataset, but you can use the following command to access the full verdict report and links to the source documents:

wget http://www.politifact.com//api/v/2/statement/[ID]/?format=json

The original sources retain the copyright of the data. Note that there are absolutely no guarantees with this data, and we provide this dataset "as is", but you are welcome to report the issues of the preliminary version of this data.
You are allowed to use this dataset for research purposes only.

Kindly cite our paper if you find this dataset useful.

@inproceedings{alhindi2018your,
title={Where is your Evidence: Improving Fact-checking by Justification Modeling},
author={Alhindi, Tariq and Petridis, Savvas and Muresan, Smaranda},
booktitle={Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)},
pages={85--90},
year={2018}
}

v2.0 10/24/2018