CLEF2019-CheckThat! Task 1

This repository contains the dataset for the CLEF2019-CheckThat! task 1.

For information about the previous edition of the shared task, refer to CLEF2018-CheckThat!

It also contains the format checker, scorer and baselines for the task.

FCPD corpus for the CLEF-2019 LAB on "Automatic Identification and Verification of Claims"
Version 2.0: May 13, 2019 (TEST GOLD LABELS RELEASED)

This file contains the basic information regarding the CLEF2019-CheckThat! Task 1 on Check-Worthiness estimation dataset provided for the CLEF2019-CheckThat Lab on "Automatic Identification and Verification of Claims". The current TRIAL version (1.0, March 12, 2019) corresponds to the release of a part of the training data set. The test set will be provided in future versions. All changes and updates on these data sets and tools are reported in Section 1 of this document.

Table of contents:

Evaluation Results
List of Versions
Contents of the Distribution v1.0
Subtasks
Data Format
Results File Format
Format checkers
Scorers
- Evaluation metrics
Baselines
Notes
Licensing
Citation
Credits

Evaluation Results

Note that, the main evaluation measure is MAP on the primary submission! The teams are ordered according to this score.

Team Name	submission	MAP	RR	R-P	P@1	P@3	P@5	P@10	P@20	P@50
Copenhagen	primary	`.1660`₁	.4176₃	.1387₄	.2857₂	`.2381`₁	`.2571`₁	.2286₂	.1571₂	.1229₂
	contr.-1	.1496	.3098	.1297	.1429	.2381	.2000	.2000	.1429	.1143
	contr.-2	.1580	.2740	.1622	.1429	.1905	.2286	.2429	.1786	.1200
TheEarthIsFlat	primary	.1597₂	.1953₁₁	`.2052`₁	.0000₄	.0952₃	.2286₂	.2143₃	`.1857`₁	`.1457`₁
	contr.-1	.1453	.3158	.1101	.2857	.2381	.1429	.1429	.1357	.1171
	contr.-2	.1821	.4187	.1937	.2857	.2381	.2286	.2286	.2143	.1400
IPIPAN	primary	.1332₃	.2864₆	.1481₂	.1429₃	.0952₃	.1429₅	.1714₅	.1500₃	.1171₃
Terrier	primary	.1263₄	.3253₅	.1088₈	.2857₂	`.2381`₁	.2000₃	.2000₄	.1286₆	.0914₇
UAICS	primary	.1234₅	`.4650`₁	.1460₃	`.4286`₁	`.2381`₁	.2286₂	`.2429`₁	.1429₄	.0943₆
	contr.-1	.0649	.2817	.0655	.1429	.2381	.1429	.1143	.0786	.0343
	contr.-2	.0726	.4492	.0547	.4286	.2857	.1714	.1143	.0643	.0257
Factify	primary	.1210₆	.2285₈	.1292₅	.1429₃	.0952₃	.1143₆	.1429₆	.1429₄	.1086₄
JUNLP	primary	.1162₇	.4419₂	.1128₇	.2857₂	.1905₂	.1714₄	.1714₅	.1286₆	.1000₅
	contr.-1	.0976	.3054	.0814	.1429	.2381	.1429	.0857	.0786	.0771
	contr.-2	.1226	.4465	.1357	.2857	.2381	.2000	.1571	.1286	.0886
nlpir01	primary	.1000₈	.2840₇	.1063₉	.1429₃	`.2381`₁	.1714₄	.1000₈	.1214₇	.0943₆
	contr.-1	.0966	.3797	.0849	.2857	.1905	.2286	.1429	.1071	.0886
	contr.-2	.0965	.3391	.1129	.1429	.2381	.2286	.1571	.1286	.0943
TOBB ETU	primary	.0884₉	.2028₁₀	.1150₆	.0000₄	.0952₃	.1429₅	.1286₇	.1357₅	.0829₈
	contr.-1	.0898	.2013	.1150	.0000	.1429	.1143	.1286	.1429	.0829
	contr.-2	.0913	.3427	.1007	.1429	.1429	.1143	.0714	.1214	.0829
IIT (ISM) Dhanbad, India	primary	.0835₁₀	.2238₉	.0714₁₁	.0000₄	.1905₂	.1143₆	.0857₉	.0857₉	.0771₉
é proibido cochilar	primary	.0796₁₁	.3514₄	.0886₁₀	.1429₃	`.2381`₁	.1429₅	.1286₇	.1071₈	.0714₁₀
	contr.-1	.1357	.5414	.1595	.4286	.2381	.2571	.2714	.1643	.1200
Fire	primary	.0528₁₂	.1365₁₂	.0570₁₂	.0000₄	.0476₄	.0571₇	.0429₁₀	.0500₁₀	.0543₁₁

List of Versions

v1.0 [2019/03/12] - TRIAL data. The training data for task 1 contains 19 fact-checked documents - debates, speeches, press conferences, etc, analysed by factcheck.org.

Contents of the Distribution v2.0

We provide the following files:

Main folder: data
- Subfolder /training
  Contains all training data released with the version 1.0
- Subfolder /test_annotated
  Contains the gold labels for the test datsaset, released with the version 2.0.
- README.md
  this file
- clef18.bib - Bibliography of the overview papers from CLEF-2018 Shared task.
- working_notes/clef19_checkthat.bib - Bibliography of overview and participants' papers.
- working_notes/clef18_checkthat.bib - Bibliography of last year's overview and participants' papers.

Subtask 1 : Check-Worthiness.

Predict which claim in a political debate should be prioritized for fact-checking. In particular, given a debate, speech or a press conference the goal is to produce a ranked list of its sentences based on their worthiness for fact checking.

Data Format

The datasets are text files with the information TAB separated. The text encoding is UTF-8.

Task 1:

line_number speaker text label

Where:

line_no: the line number (starting from 1)
speaker: the person speaking (a candidate, the moderator, or "SYSTEM"; the latter is used for the audience reaction)
text: a sentence that the speaker said
label: 1 if this sentence is to be fact-checked, and 0 otherwise

Example:

...
65 TRUMP So we're losing our good jobs, so many of them. 0
66 TRUMP When you look at what's happening in Mexico, a friend of mine who builds plants said it's the eighth wonder of the world. 0
67 TRUMP They're building some of the biggest plants anywhere in the world, some of the most sophisticated, some of the best plants. 0
68 TRUMP With the United States, as he said, not so much. 0
69 TRUMP So Ford is leaving. 1
70 TRUMP You see that, their small car division leaving. 1
71 TRUMP Thousands of jobs leaving Michigan, leaving Ohio. 1
72 TRUMP They're all leaving. 0
...

Results File Format:

For this task, the expected results file is a list of claims with the estimated score for check-worthiness. Each line contains a tab-separated line with:

line_number score

Where line_number is the number of the claim in the debate and score is a number, indicating the priority of the claim for fact-checking. For example:

1 0.9056
2 0.6862
3 0.7665
4 0.9046
5 0.2598
6 0.6357
7 0.9049
8 0.8721
9 0.5729
10 0.1693
11 0.4115
...

Your result file MUST contain scores for all lines from the respective input file. Otherwise the scorer will not score this result file.

Format checkers

The checker for the subtask is located in the format_checker module of the project. The format checker verifies that your generated results file complies with the expected format. To launch it run:

python3 format_checker/main.py --pred_file_path=<path_to_your_results_file>

run_format_checker.sh includes examples of the output of the checker when dealing with an ill-formed results file. Its output can be seen in run_format_checker_out.txt The checks for completness (if the result files contain all lines / claims) is NOT handled by the format checkers, because they receive only the results file and not the gold one.

Scorers

Launch the scorers for the task as follows:

python3 scorer/main.py --gold_file_path="<path_gold_file_1, path_to_gold_file_k>" --pred_file_path="<predictions_file_1, predictions_file_k>"

Both --gold_file_path and --pred_file_path take a single string that contains a comma separated list of file paths. The lists may be of arbitraty positive length (so even a single file path is OK) but their lengths must match.

<path_to_gold_file_n> is the path to the file containing the gold annotations for debate n and <predictions_file_n> is the path to the respective file holding predicted results for debate n, which must follow the format, described in the 'Results File Format' section.

The scorers call the format checkers for the task to verify the output is properly shaped. They also handle checking if the provided predictions file contains all lines / claims from the gold one.

run_scorer.sh provides examples on using the scorers and the results can be viewed in the run_scorer_out.txt file.

Evaluation metrics

For Task 1 (ranking): R-Precision, Average Precision, Recipocal Rank, Precision@k and means of these over multiple debates. The official metric for task1, that will be used for the competition ranking is the Mean Average Precision (MAP)

Baselines

The baselines module contains a random and a simple ngram baseline for the task.

If you execute main.py, both of the baselines will be trained on all but the 20190108_oval_office.tsv debate and evaluated on the 20190108_oval_office.tsv debate. The performance of both baselines will be displayed.

Licensing

These datasets are free for general research use.

Citation

When referring to the 2019 shared task, cite the following paper:

@InProceedings{clef-checkthat:2019,
 author = "Elsayed, Tamer and
    Nakov, Preslav and
    Barr\'{o}n-Cede{\~n}o, Alberto and
    Hasanain, Maram and
    Suwaileh, Reem and
    {Da San Martino}, Giovanni and 
    Atanasova, Pepa",
 title  = "Overview of the CLEF-2019 CheckThat!: Automatic Identification and Verification of Claims",
 booktitle = "Experimental IR Meets Multilinguality, Multimodality, and Interaction",
 series    = "LNCS",
 pubblisher = "Springer",
 address   = "Lugano, Switzerland",
 month     = "September",
 year      = 2019
}

When referring specifically to Task 1, please, cite the following :

@InProceedings{clef-checkthat-T1:2019,
    author = "Atanasova, Pepa and
    Nakov, Preslav and
    Karadzhov, Georgi and
    Mohtarami, Mitra and
    Da San Martino, Giovanni",
    title  = "Overview of the CLEF-2019 CheckThat! Lab on Automatic Identification and Verification of Claims. Task 1: Check-Worthiness",
    crossref = "clef-ceur:19"
}

To cite participants' papers refer to the following file working_notes/clef19_checkthat.bib.
If you want to cite any of the papers from the previous edition of the task, refer to this file working_notes/clef18_checkthat.bib. [PROCEEDINGS WITH ALL PAPERS from 2018]

Credits

Lab Organizers:

Pepa Atanasova, University of Copenhagen
Preslav Nakov, Qatar Computing Research Institute, HBKU
Mitra Mohtarami, MIT
Georgi Karadzhov, Sofia University
Spas Kyuchukov, Sofia University
Alberto Barrón-Cedeño, Qatar Computing Research Institute, HBKU
Giovanni Da San Martino, Qatar Computing Research Institute, HBKU
Tamer Elsayed, Qatar University
Maram Hasanain, Qatar University
Reem Suwaileh, Qatar University

Task website: https://sites.google.com/view/clef2019-checkthat/ The official rules are published on the website, check them!

Contact: clef-factcheck@googlegroups.com

codehack9991/clef2019-factchecking-task1