CLEF2020-CheckThat! Task 5: Check-worthiness for Political Debates

This repository contains the dataset for the CLEF2020-CheckThat! task 5 on Check-wothiness estimation for political debates. It also contains the format checker, scorer and baselines for the task.

FCPD corpus for the CLEF-2020 LAB on "Automatic Identification and Verification of Claims"
Version 4.0: Jun 8, 2020 (Data, Baseline, and input-Test Release)

The task is part of the CLEF2020-CheckThat Lab on "Automatic Identification and Verification of Claims". The current version includes the training dataset, evaluation scores, baselines and the test files (with gold labels).

Table of contents:

Evaluation Results
List of Versions
Contents of the Repository
Task Definition
Data Format
Results File Format
Format checker
Scorer
- Evaluation metrics
Baselines
Licensing
Citation
Previos Editions
Credits
Citation

Evaluation Results

You can find the results in this spreadsheet, https://tinyurl.com/y9sjooxo.

List of Versions

4.0 [2020/06/08] - Official test results and gold labels released.
3.0 [2020/05/26] - Input test data released
2.0 [2020/05/11] - Updated some labels in the training data.
v1.0 [2020/03/16] - data. The training data for task 5 contains 50 fact-checked documents - debates, speeches, press conferences, etc.

Contents of the Repository

We provide the following files:

Main folder: data
- Subfolder: v1
  - Subfolder /training
    Contains all training data released with the version 1.0
- Subfolder: v2
  - Subfolder /training
    Contains all training data released with the version 2.0
- README.md
  this file
- working_notes/clef19_checkthat.bib - Bibliography of 2019 overview and participants' papers.
- working_notes/clef18_checkthat.bib - Bibliography of 2018 overview and participants' papers.
Main folder: test-input
- test-input.zip
  File containing 20 debates that will be used for testing contestents' models.
- test-gold.zip
  File containing 20 debates that is used for testing with gold labels.

Task Definition

The "Check-worthines for debates" task is defined as "predicting which claim in a political debate should be prioritized for fact-checking". In particular, given a debate, speech or a press conference the goal is to produce a ranked list of its sentences based on their worthiness for fact checking.

NOTE: You can use data from the CLEF-2018 and the CLEF-2019 editions of this task

Data Format

The input files are TAB-separated CSV files with four fields:

line_number speaker text label

Where:

line_number: the line number (starting from 1)
speaker: the person speaking (a candidate, the moderator, or "SYSTEM"; the latter is used for the audience reaction)
text: a sentence that the speaker said
label: 1 if this sentence is to be fact-checked, and 0 otherwise

The text encoding is UTF-8.

Example:

...
65 TRUMP So we're losing our good jobs, so many of them. 0
66 TRUMP When you look at what's happening in Mexico, a friend of mine who builds plants said it's the eighth wonder of the world. 0
67 TRUMP They're building some of the biggest plants anywhere in the world, some of the most sophisticated, some of the best plants. 0
68 TRUMP With the United States, as he said, not so much. 0
69 TRUMP So Ford is leaving. 1
70 TRUMP You see that, their small car division leaving. 1
71 TRUMP Thousands of jobs leaving Michigan, leaving Ohio. 1
72 TRUMP They're all leaving. 0
...

Results File Format:

For this task, the expected results file is a list of claims with the estimated score for check-worthiness. Each row contains two tab-separated fields:

line_number score

Where line_number is the number of the claim in the debate and score is a number, indicating the priority of the claim for fact-checking. For example:

1 0.9056
2 0.6862
3 0.7665
4 0.9046
5 0.2598
6 0.6357
7 0.9049
8 0.8721
9 0.5729
10 0.1693
11 0.4115
...

Your result file MUST contain scores for all lines of the input file. Otherwise the scorer will return an error and no score will be computed.

Format checker

The checker for the task is located in the format_checker module of the project. The format checker verifies that your generated results file complies with the expected format. To launch it run:

python3 format_checker/main.py --pred_file_path=<path_to_your_results_file>

run_format_checker.sh includes examples of the output of the checker when dealing with an ill-formed results file. Its output can be seen in run_format_checker_out.txt. Note that the checker cannot verify whether the prediction file you submit contain all lines / claims), because it does not have access to the corresponding gold file.

The script used is adapted from the one for the CLEF2019 Check That! Lab Task 1 (check-worthiness).

Scorer

Launch the scorer for the task as follows:

python3 scorer/main.py --gold_file_path="<path_gold_file_1, path_to_gold_file_k>" --pred_file_path="<predictions_file_1, predictions_file_k>"

Both --gold_file_path and --pred_file_path take a single string that contains a comma separated list of file paths. The lists may be of arbitraty positive length (so even a single file path is OK) but their lengths must match.

<path_to_gold_file_n> is the path to the file containing the gold annotations for debate n and <predictions_file_n> is the path to the corresponding file with participants' predictions for debate n, which must follow the format, described in the 'Results File Format' section.

The scorer invokes the format checker for the task to verify the output is properly shaped. It also handles checking if the provided predictions file contains all lines / claims from the gold one.

run_scorer.sh provides examples on using the scorers and the results can be viewed in the run_scorer_out.txt file.

The script used is adapted from the one for the CLEF2019 Check That! Lab Task 1 (check-worthiness).

Evaluation metrics

The official evaluation measure is Mean Average Precision (MAP). We also report R-Precision, Average Precision, Recipocal Rank, Precision@k and averaged over multiple debates.

Baselines

The baselines module contains a random and a simple ngram baseline for the task. To launch the baseline script you need to install packages dependencies found in requirement.txt using the following:

pip3 install -r requirement.txt

To launch the baseline script run the following:

python3 baselines/baselines.py

Both of the baselines will be trained on all but the latest 20% of the debates as they are used as the dev dataset. The performance of both baselines will be displayed:
Random Baseline AVGP: 0.02098366142405398
Ngram Baseline AVGP: 0.09456735615609717

The scripts used are adapted from the ones for the CLEF2019 Check That! Lab Task 1 (check-worthiness).

Licensing

These datasets are free for general research use.

Citation

If you want to cite any of the papers from the previous edition of the task, refer to this file working_notes/clef19_checkthat.bib [PROCEEDINGS WITH ALL PAPERS from 2019] or working_notes/clef18_checkthat.bib [PROCEEDINGS WITH ALL PAPERS from 2018].

Previous Editions

For information about the previous edition of the shared task, refer to CLEF2019-CheckThat! and CLEF2018-CheckThat!.

Credits

Task 5 Organizers:

Shaden Shaar, Qatar Computing Research Institute, HBKU
Giovanni Da San Martino, Qatar Computing Research Institute, HBKU
Preslav Nakov, Qatar Computing Research Institute, HBKU

Task website: https://sites.google.com/view/clef2020-checkthat/tasks/tasks-1-5-check-worthiness?authuser=0

Contact: clef-factcheck@googlegroups.com

Citation

You can find the overview paper on the CLEF2020-CheckThat! Lab in the papers papers, "Overview of CheckThat! 2020 --- Automatic Identification and Verification of Claims in Social Media" (see citation bellow) in this link, and "CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media" (see citation bellow) in this link.

You can find CLEF2020-CheckThat! Task 5 details published in the paper "Overview of the CLEF-2020 CheckThat! Lab on Automatic Identification and Verification of Claims in Social Media: English tasks" (see citation bellow).

@InProceedings{clef-checkthat:2020,
 author = "Barr\'{o}n-Cede{\~n}o, Alberto and
    Elsayed, Tamer and
    Nakov, Preslav and
    {Da San Martino}, Giovanni and
    Hasanain, Maram and   
    Suwaileh, Reem and
    Haouari, Fatima and
    Babulkov, Nikolay and
    Hamdan, Bayan and
    Nikolov, Alex and   
    Shaar, Shaden and
    Ali, {Zien Sheikh}",
 title  = "{Overview of CheckThat! 2020} --- Automatic Identification and
Verification of Claims in Social Media",
 year = {2020},
 booktitle = "Proceedings of the 11th International Conference of the CLEF Association: Experimental IR Meets Multilinguality, Multimodality, and Interaction",
 series = {CLEF~'2020},
 address = {Thessaloniki, Greece},
 nopages="--",
}

@InProceedings{clef-checkthat-en:2020,
 author = "Shaar, Shaden and
    Nikolov, Alex and
    Babulkov, Nikolay and
    Alam, Firoj and  
    Barr\'{o}n-Cede{\~n}o, Alberto and
    Elsayed, Tamer and
    Hasanain, Maram and    
    Suwaileh, Reem and
    Haouari, Fatima and
    {Da San Martino}, Giovanni and
    Nakov, Preslav",
 title = "Overview of {CheckThat!} 2020 {E}nglish: Automatic Identification and Verification of Claims in Social Media",
  booktitle = "Working Notes of CLEF 2020---Conference and Labs of the Evaluation Forum",
  series = {CLEF~'2020},
  address = {Thessaloniki, Greece},
  year = {2020}
}

@InProceedings{CheckThat:ECIR2020,
  author    = {Alberto Barr{\'{o}}n{-}Cede{\~{n}}o and
               Tamer Elsayed and
               Preslav Nakov and
               Giovanni Da San Martino and
               Maram Hasanain and
               Reem Suwaileh and
               Fatima Haouari},
  title     = {CheckThat! at {CLEF} 2020: Enabling the Automatic Identification and Verification of Claims in Social Media},
    booktitle = {Proceedings of the 42nd European Conference on Information Retrieval},
    series = {ECIR~'19},
    pages = {499--507},
    address   = {Lisbon, Portugal},
    month     = {April},
    year      = {2020},
}

sshaar/clef2020-factchecking-task5

CLEF2020-CheckThat! Task 5: Check-worthiness for Political Debates

Evaluation Results

List of Versions

Contents of the Repository

Task Definition

Data Format

Results File Format:

Format checker

Scorer

Evaluation metrics

Baselines

Licensing

Citation

Previous Editions

Credits

Citation