CyberWallE at SemEval-2020 Task 11: An Analysis of Feature Engineering for Ensemble Models for Propaganda Detection

With the advent of rapid dissemination of news articles through online social media, automatic detection of biased or fake reporting has become more crucial than ever before. This repository contains the code and article describing our participation in both subtasks of the SemEval 2020 shared task for the Detection of Propaganda Techniques in News Articles.

The Span Identification (SI) subtask is a binary classification problem to discover propaganda at the token level, and the Technique Classification (TC) subtask involves a 14-way classification of propagandistic text fragments. We use a bi-LSTM architecture in the SI subtask and train a complex ensemble model for the TC subtask. Our architectures are built using embeddings from BERT in combination with additional lexical features and extensive label post-processing. Our systems achieve a rank of 8 out of 35 teams in the SI subtask (F1-score: 43.86%) and 8 out of 31 teams in the TC subtask (F1-score: 57.37%).

Our article provides an extensive exploration of various embedding, feature and classifier combinations. The repository is organized as follows:

baselines (from the organizers, empty in the remote): Baseline code + predictions
data (empty in the remote*): Training/development input files with features, lexica for semantic + rhetorical structures (*Some of the contents can be downloaded from sources given in the folder, the rest can be generated using the files in utils)
datasets (from the organizers, empty in the remote): Articles, training labels
eda: Code for analyzing label distributions, sentence lengths and other features of the given data
models: Our models
tools (from the organizers, empty in the remote): Scripts for evaluating the data
utils: Code for data pre- and post-processing and evaluation

@InProceedings{SemEval2020-11-CyberWallE,
author = "Blaschke, Verena and Korniyenko, Maxim and Tureski, Sam",
title = "{CyberWallE} at {SemEval}-2020 {T}ask 11: An Analysis of Feature Engineering for Ensemble Models for Propaganda Detection",
pages = "",
booktitle = "Proceedings of the 14th International Workshop on Semantic Evaluation",
series = "SemEval 2020",
year = "2020",
address = "Barcelona, Spain",
month = "December",
}

Updated results

After the camera-ready deadline, the task organizers announced that they had found a bug in the evaluation script. Fixing the bug changed the scores on the test data. We thus achieve rank 12 of 35 in the span identification subtask (F1: 43.59%) and rank 6 of 31 in the technique identification task (F1: 58.99%).

Here is an updated version of Table 3 in our paper:

Technique	Proportion (dev)	Recall (SI dev)	F1-score (TC dev)	F1-score (TC test, bug)	F1-score (TC test, NEW)	TC change (dev->NEW)
Loaded language	30.6	70.6	76.6	74.7	75.8	-0.8
Name calling, labeling	17.2	63.0	81.0	70.9	71.6	-9.4
Repetition	13.6	63.8	73.3	47.7	52.9	-20.4
Flag-waving	8.2	74.4	73.7	54.4	56.2	-17.5
Exaggeration, minimisation	6.4	57.6	52.7	28.3	33.2	-19.5
Doubt	6.2	46.9	53.8	58.7	59.2	+5.4
Appeal to fear/prejudice	4.4	62.9	30.6	39.9	39.8	+9.2
Slogans	3.7	74.6	51.4	39.4	45.5	-5.9
Whataboutism, straw men, red herring	2.7	36.8	0.0	0.0	0.0	0.0
Black-and-white fallacy	2.1	46.9	21.4	23.7	26.3	+4.9
Causal oversimplification	1.7	50.7	21.1	15.4	15.4	-5.7
Thought-terminating clichés	1.6	51.4	17.4	23.8	23.8	+6.4
Appeal to authority	1.3	49.9	18.2	14.7	14.6	-3.6
Bandwagon, reductio ad hitlerum	0.5	8.4	22.2	12.2	12.2	-10.0
All classes	100	63.8	66.4	57.4	58.9	-7.5

Table 3: Technique-level breakdown of model performances for both subtasks. The proportions, recallvalues and F1-scores are percentages. The change of the F1-score is given in percentage points.

linhnguyen222/CyberWallE-propaganda-detection

CyberWallE at SemEval-2020 Task 11: An Analysis of Feature Engineering for Ensemble Models for Propaganda Detection

Updated results