This repository contains the dataset for the CLEF2020-CheckThat! task 5 on Check-wothiness estimation for political debates. It also contains the format checker, scorer and baselines for the task.
FCPD corpus for the CLEF-2020 LAB on "Automatic Identification and Verification of Claims"
Version 4.0: Jun 8, 2020 (Data, Baseline, and input-Test Release)
The task is part of the CLEF2020-CheckThat Lab on "Automatic Identification and Verification of Claims". The current version includes the training dataset, evaluation scores, baselines and the test files (with gold labels).
Table of contents:
- Evaluation Results
- List of Versions
- Contents of the Repository
- Task Definition
- Data Format
- Results File Format
- Format checker
- Scorer
- Baselines
- Licensing
- Citation
- Previos Editions
- Credits
- Citation
You can find the results in this spreadsheet, https://tinyurl.com/y9sjooxo.
-
4.0 [2020/06/08] - Official test results and gold labels released.
-
3.0 [2020/05/26] - Input test data released
-
2.0 [2020/05/11] - Updated some labels in the training data.
-
v1.0 [2020/03/16] - data. The training data for task 5 contains 50 fact-checked documents - debates, speeches, press conferences, etc.
We provide the following files:
-
Main folder: data
-
Subfolder: v1
- Subfolder /training
Contains all training data released with the version 1.0
- Subfolder /training
-
Subfolder: v2
- Subfolder /training
Contains all training data released with the version 2.0
- Subfolder /training
-
README.md
this file -
working_notes/clef19_checkthat.bib - Bibliography of 2019 overview and participants' papers.
-
working_notes/clef18_checkthat.bib - Bibliography of 2018 overview and participants' papers.
-
-
Main folder: test-input
- test-input.zip
File containing 20 debates that will be used for testing contestents' models. - test-gold.zip
File containing 20 debates that is used for testing with gold labels.
- test-input.zip
The "Check-worthines for debates" task is defined as "predicting which claim in a political debate should be prioritized for fact-checking". In particular, given a debate, speech or a press conference the goal is to produce a ranked list of its sentences based on their worthiness for fact checking.
NOTE: You can use data from the CLEF-2018 and the CLEF-2019 editions of this task
The input files are TAB-separated CSV files with four fields:
line_number speaker text label
Where:
- line_number: the line number (starting from 1)
- speaker: the person speaking (a candidate, the moderator, or "SYSTEM"; the latter is used for the audience reaction)
- text: a sentence that the speaker said
- label: 1 if this sentence is to be fact-checked, and 0 otherwise
The text encoding is UTF-8.
Example:
...
65 TRUMP So we're losing our good jobs, so many of them. 0
66 TRUMP When you look at what's happening in Mexico, a friend of mine who builds plants said it's the eighth wonder of the world. 0
67 TRUMP They're building some of the biggest plants anywhere in the world, some of the most sophisticated, some of the best plants. 0
68 TRUMP With the United States, as he said, not so much. 0
69 TRUMP So Ford is leaving. 1
70 TRUMP You see that, their small car division leaving. 1
71 TRUMP Thousands of jobs leaving Michigan, leaving Ohio. 1
72 TRUMP They're all leaving. 0
...
For this task, the expected results file is a list of claims with the estimated score for check-worthiness. Each row contains two tab-separated fields:
line_number score
Where line_number is the number of the claim in the debate and score is a number, indicating the priority of the claim for fact-checking. For example:
1 0.9056
2 0.6862
3 0.7665
4 0.9046
5 0.2598
6 0.6357
7 0.9049
8 0.8721
9 0.5729
10 0.1693
11 0.4115
...
Your result file MUST contain scores for all lines of the input file. Otherwise the scorer will return an error and no score will be computed.
The checker for the task is located in the format_checker module of the project. The format checker verifies that your generated results file complies with the expected format. To launch it run:
python3 format_checker/main.py --pred_file_path=<path_to_your_results_file>
run_format_checker.sh
includes examples of the output of the checker when dealing with an ill-formed results file.
Its output can be seen in run_format_checker_out.txt.
Note that the checker cannot verify whether the prediction file you submit contain all lines / claims), because it does not have access to the corresponding gold file.
The script used is adapted from the one for the CLEF2019 Check That! Lab Task 1 (check-worthiness).
Launch the scorer for the task as follows:
python3 scorer/main.py --gold_file_path="<path_gold_file_1, path_to_gold_file_k>" --pred_file_path="<predictions_file_1, predictions_file_k>"
Both --gold_file_path
and --pred_file_path
take a single string that contains a comma separated list of file paths. The lists may be of arbitraty positive length (so even a single file path is OK) but their lengths must match.
<path_to_gold_file_n> is the path to the file containing the gold annotations for debate n and <predictions_file_n> is the path to the corresponding file with participants' predictions for debate n, which must follow the format, described in the 'Results File Format' section.
The scorer invokes the format checker for the task to verify the output is properly shaped. It also handles checking if the provided predictions file contains all lines / claims from the gold one.
run_scorer.sh
provides examples on using the scorers and the results can be viewed in the run_scorer_out.txt file.
The script used is adapted from the one for the CLEF2019 Check That! Lab Task 1 (check-worthiness).
The official evaluation measure is Mean Average Precision (MAP). We also report R-Precision, Average Precision, Recipocal Rank, Precision@k and averaged over multiple debates.
The baselines module contains a random and a simple ngram baseline for the task. To launch the baseline script you need to install packages dependencies found in requirement.txt using the following:
pip3 install -r requirement.txt
To launch the baseline script run the following:
python3 baselines/baselines.py
Both of the baselines will be trained on all but the latest 20% of the debates as they are used as the dev dataset.
The performance of both baselines will be displayed:
Random Baseline AVGP: 0.02098366142405398
Ngram Baseline AVGP: 0.09456735615609717
The scripts used are adapted from the ones for the CLEF2019 Check That! Lab Task 1 (check-worthiness).
These datasets are free for general research use.
- If you want to cite any of the papers from the previous edition of the task, refer to this file working_notes/clef19_checkthat.bib [PROCEEDINGS WITH ALL PAPERS from 2019] or working_notes/clef18_checkthat.bib [PROCEEDINGS WITH ALL PAPERS from 2018].
For information about the previous edition of the shared task, refer to CLEF2019-CheckThat! and CLEF2018-CheckThat!.
Task 5 Organizers:
-
Shaden Shaar, Qatar Computing Research Institute, HBKU
-
Giovanni Da San Martino, Qatar Computing Research Institute, HBKU
-
Preslav Nakov, Qatar Computing Research Institute, HBKU
Task website: https://sites.google.com/view/clef2020-checkthat/tasks/tasks-1-5-check-worthiness?authuser=0
Contact: clef-factcheck@googlegroups.com
You can find the overview paper on the CLEF2020-CheckThat! Lab in the papers papers, "Overview of CheckThat! 2020 --- Automatic Identification and Verification of Claims in Social Media" (see citation bellow) in this link, and "CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media" (see citation bellow) in this link.
You can find CLEF2020-CheckThat! Task 5 details published in the paper "Overview of the CLEF-2020 CheckThat! Lab on Automatic Identification and Verification of Claims in Social Media: English tasks" (see citation bellow).
@InProceedings{clef-checkthat:2020,
author = "Barr\'{o}n-Cede{\~n}o, Alberto and
Elsayed, Tamer and
Nakov, Preslav and
{Da San Martino}, Giovanni and
Hasanain, Maram and
Suwaileh, Reem and
Haouari, Fatima and
Babulkov, Nikolay and
Hamdan, Bayan and
Nikolov, Alex and
Shaar, Shaden and
Ali, {Zien Sheikh}",
title = "{Overview of CheckThat! 2020} --- Automatic Identification and
Verification of Claims in Social Media",
year = {2020},
booktitle = "Proceedings of the 11th International Conference of the CLEF Association: Experimental IR Meets Multilinguality, Multimodality, and Interaction",
series = {CLEF~'2020},
address = {Thessaloniki, Greece},
nopages="--",
}
@InProceedings{clef-checkthat-en:2020,
author = "Shaar, Shaden and
Nikolov, Alex and
Babulkov, Nikolay and
Alam, Firoj and
Barr\'{o}n-Cede{\~n}o, Alberto and
Elsayed, Tamer and
Hasanain, Maram and
Suwaileh, Reem and
Haouari, Fatima and
{Da San Martino}, Giovanni and
Nakov, Preslav",
title = "Overview of {CheckThat!} 2020 {E}nglish: Automatic Identification and Verification of Claims in Social Media",
booktitle = "Working Notes of CLEF 2020---Conference and Labs of the Evaluation Forum",
series = {CLEF~'2020},
address = {Thessaloniki, Greece},
year = {2020}
}
@InProceedings{CheckThat:ECIR2020,
author = {Alberto Barr{\'{o}}n{-}Cede{\~{n}}o and
Tamer Elsayed and
Preslav Nakov and
Giovanni Da San Martino and
Maram Hasanain and
Reem Suwaileh and
Fatima Haouari},
title = {CheckThat! at {CLEF} 2020: Enabling the Automatic Identification and Verification of Claims in Social Media},
booktitle = {Proceedings of the 42nd European Conference on Information Retrieval},
series = {ECIR~'19},
pages = {499--507},
address = {Lisbon, Portugal},
month = {April},
year = {2020},
}