/clef2019-factchecking-task1

Contains data, format checker, scorer and baselines for the CLEF2019-CheckThat! Task 1.

Primary LanguagePython

CLEF2019-CheckThat! Task 1

This repository contains the dataset for the CLEF2019-CheckThat! task 1.

For information about the previous edition of the shared task, refer to CLEF2018-CheckThat!

It also contains the format checker, scorer and baselines for the task.

FCPD corpus for the CLEF-2019 LAB on "Automatic Identification and Verification of Claims"
Version 2.0: May 13, 2019 (TEST GOLD LABELS RELEASED)

This file contains the basic information regarding the CLEF2019-CheckThat! Task 1 on Check-Worthiness estimation dataset provided for the CLEF2019-CheckThat Lab on "Automatic Identification and Verification of Claims". The current TRIAL version (1.0, March 12, 2019) corresponds to the release of a part of the training data set. The test set will be provided in future versions. All changes and updates on these data sets and tools are reported in Section 1 of this document.

Table of contents:

Evaluation Results

Note that, the main evaluation measure is MAP on the primary submission! The teams are ordered according to this score.

Team Name submission MAP RR R-P P@1 P@3 P@5 P@10 P@20 P@50
Copenhagen primary .16601 .41763 .13874 .28572 .23811 .25711 .22862 .15712 .12292
contr.-1 .1496 .3098 .1297 .1429 .2381 .2000 .2000 .1429 .1143
contr.-2 .1580 .2740 .1622 .1429 .1905 .2286 .2429 .1786 .1200
TheEarthIsFlat primary .15972 .195311 .20521 .00004 .09523 .22862 .21433 .18571 .14571
contr.-1 .1453 .3158 .1101 .2857 .2381 .1429 .1429 .1357 .1171
contr.-2 .1821 .4187 .1937 .2857 .2381 .2286 .2286 .2143 .1400
IPIPAN primary .13323 .28646 .14812 .14293 .09523 .14295 .17145 .15003 .11713
Terrier primary .12634 .32535 .10888 .28572 .23811 .20003 .20004 .12866 .09147
UAICS primary .12345 .46501 .14603 .42861 .23811 .22862 .24291 .14294 .09436
contr.-1 .0649 .2817 .0655 .1429 .2381 .1429 .1143 .0786 .0343
contr.-2 .0726 .4492 .0547 .4286 .2857 .1714 .1143 .0643 .0257
Factify primary .12106 .22858 .12925 .14293 .09523 .11436 .14296 .14294 .10864
JUNLP primary .11627 .44192 .11287 .28572 .19052 .17144 .17145 .12866 .10005
contr.-1 .0976 .3054 .0814 .1429 .2381 .1429 .0857 .0786 .0771
contr.-2 .1226 .4465 .1357 .2857 .2381 .2000 .1571 .1286 .0886
nlpir01 primary .10008 .28407 .10639 .14293 .23811 .17144 .10008 .12147 .09436
contr.-1 .0966 .3797 .0849 .2857 .1905 .2286 .1429 .1071 .0886
contr.-2 .0965 .3391 .1129 .1429 .2381 .2286 .1571 .1286 .0943
TOBB ETU primary .08849 .202810 .11506 .00004 .09523 .14295 .12867 .13575 .08298
contr.-1 .0898 .2013 .1150 .0000 .1429 .1143 .1286 .1429 .0829
contr.-2 .0913 .3427 .1007 .1429 .1429 .1143 .0714 .1214 .0829
IIT (ISM) Dhanbad, India primary .083510 .22389 .071411 .00004 .19052 .11436 .08579 .08579 .07719
é proibido cochilar primary .079611 .35144 .088610 .14293 .23811 .14295 .12867 .10718 .071410
contr.-1 .1357 .5414 .1595 .4286 .2381 .2571 .2714 .1643 .1200
Fire primary .052812 .136512 .057012 .00004 .04764 .05717 .042910 .050010 .054311

List of Versions

  • v1.0 [2019/03/12] - TRIAL data. The training data for task 1 contains 19 fact-checked documents - debates, speeches, press conferences, etc, analysed by factcheck.org.

Contents of the Distribution v2.0

We provide the following files:

Subtask 1 : Check-Worthiness.

Predict which claim in a political debate should be prioritized for fact-checking. In particular, given a debate, speech or a press conference the goal is to produce a ranked list of its sentences based on their worthiness for fact checking.

Data Format

The datasets are text files with the information TAB separated. The text encoding is UTF-8.

Task 1:

line_number speaker text label

Where:

  • line_no: the line number (starting from 1)
  • speaker: the person speaking (a candidate, the moderator, or "SYSTEM"; the latter is used for the audience reaction)
  • text: a sentence that the speaker said
  • label: 1 if this sentence is to be fact-checked, and 0 otherwise

Example:

...
65 TRUMP So we're losing our good jobs, so many of them. 0
66 TRUMP When you look at what's happening in Mexico, a friend of mine who builds plants said it's the eighth wonder of the world. 0
67 TRUMP They're building some of the biggest plants anywhere in the world, some of the most sophisticated, some of the best plants. 0
68 TRUMP With the United States, as he said, not so much. 0
69 TRUMP So Ford is leaving. 1
70 TRUMP You see that, their small car division leaving. 1
71 TRUMP Thousands of jobs leaving Michigan, leaving Ohio. 1
72 TRUMP They're all leaving. 0
...

Results File Format:

For this task, the expected results file is a list of claims with the estimated score for check-worthiness. Each line contains a tab-separated line with:

line_number score

Where line_number is the number of the claim in the debate and score is a number, indicating the priority of the claim for fact-checking. For example:

1 0.9056
2 0.6862
3 0.7665
4 0.9046
5 0.2598
6 0.6357
7 0.9049
8 0.8721
9 0.5729
10 0.1693
11 0.4115
...

Your result file MUST contain scores for all lines from the respective input file. Otherwise the scorer will not score this result file.

Format checkers

The checker for the subtask is located in the format_checker module of the project. The format checker verifies that your generated results file complies with the expected format. To launch it run:

python3 format_checker/main.py --pred_file_path=<path_to_your_results_file>

run_format_checker.sh includes examples of the output of the checker when dealing with an ill-formed results file. Its output can be seen in run_format_checker_out.txt The checks for completness (if the result files contain all lines / claims) is NOT handled by the format checkers, because they receive only the results file and not the gold one.

Scorers

Launch the scorers for the task as follows:

python3 scorer/main.py --gold_file_path="<path_gold_file_1, path_to_gold_file_k>" --pred_file_path="<predictions_file_1, predictions_file_k>"

Both --gold_file_path and --pred_file_path take a single string that contains a comma separated list of file paths. The lists may be of arbitraty positive length (so even a single file path is OK) but their lengths must match.

<path_to_gold_file_n> is the path to the file containing the gold annotations for debate n and <predictions_file_n> is the path to the respective file holding predicted results for debate n, which must follow the format, described in the 'Results File Format' section.

The scorers call the format checkers for the task to verify the output is properly shaped. They also handle checking if the provided predictions file contains all lines / claims from the gold one.

run_scorer.sh provides examples on using the scorers and the results can be viewed in the run_scorer_out.txt file.

Evaluation metrics

For Task 1 (ranking): R-Precision, Average Precision, Recipocal Rank, Precision@k and means of these over multiple debates. The official metric for task1, that will be used for the competition ranking is the Mean Average Precision (MAP)

Baselines

The baselines module contains a random and a simple ngram baseline for the task.

If you execute main.py, both of the baselines will be trained on all but the 20190108_oval_office.tsv debate and evaluated on the 20190108_oval_office.tsv debate. The performance of both baselines will be displayed.

Licensing

These datasets are free for general research use.

Citation

  • When referring to the 2019 shared task, cite the following paper:
@InProceedings{clef-checkthat:2019,
 author = "Elsayed, Tamer and
    Nakov, Preslav and
    Barr\'{o}n-Cede{\~n}o, Alberto and
    Hasanain, Maram and
    Suwaileh, Reem and
    {Da San Martino}, Giovanni and 
    Atanasova, Pepa",
 title  = "Overview of the CLEF-2019 CheckThat!: Automatic Identification and Verification of Claims",
 booktitle = "Experimental IR Meets Multilinguality, Multimodality, and Interaction",
 series    = "LNCS",
 pubblisher = "Springer",
 address   = "Lugano, Switzerland",
 month     = "September",
 year      = 2019
}
  • When referring specifically to Task 1, please, cite the following :
@InProceedings{clef-checkthat-T1:2019,
    author = "Atanasova, Pepa and
    Nakov, Preslav and
    Karadzhov, Georgi and
    Mohtarami, Mitra and
    Da San Martino, Giovanni",
    title  = "Overview of the CLEF-2019 CheckThat! Lab on Automatic Identification and Verification of Claims. Task 1: Check-Worthiness",
    crossref = "clef-ceur:19"
}

Credits

Lab Organizers:

  • Pepa Atanasova, University of Copenhagen
  • Preslav Nakov, Qatar Computing Research Institute, HBKU
  • Mitra Mohtarami, MIT
  • Georgi Karadzhov, Sofia University
  • Spas Kyuchukov, Sofia University
  • Alberto Barrón-Cedeño, Qatar Computing Research Institute, HBKU
  • Giovanni Da San Martino, Qatar Computing Research Institute, HBKU
  • Tamer Elsayed, Qatar University
  • Maram Hasanain, Qatar University
  • Reem Suwaileh, Qatar University

Task website: https://sites.google.com/view/clef2019-checkthat/ The official rules are published on the website, check them!

Contact: clef-factcheck@googlegroups.com