/FactChecking

Second assignment for NLP class at @unibo

Primary LanguageJupyter Notebook

✅ Fact-checking ⁉️

This repository contains a project realized for an assignment of the Natural Language Processing course of the Master's degree in Artificial Intelligence, University of Bologna.

Description

Fact checking is a popular NLP task which consists in verify the reliability of some statement, by comparing it with a given knowledge base. In this experiment, we’ll use the FEVER dataset to train a model able to understand whether a fact is verifiable. The model consists in a neural network, structured in different ways, working on embeddings obtained with GloVE. A voting mechanism is then required to make predictions.

Dataset

The FEVER dataset is about facts taken from Wikipedia documents that have to be verified. In particular, facts could face manual modifications in order to define fake information or to give different formulations of the same concept.

The dataset consists of 185,445 claims manually verified against the introductory sections of Wikipedia pages and classified as Supported, Refuted or NotEnoughInfo. For the first two classes, systems and annotators need to also return the combination of sentences forming the necessary evidence supporting or refuting the claim.

An already pre-precessed version of the dataset is been used, in order to concentrate on the classification pipeline (pre-processing, model definition, evaluation and training).

Request and solution proposed

The task to comply with is described in the assignment description. In order to have a better understanging of our proposed solution, take a look to the notebook and the report.

Model

What the neural network does is to encode two different inputs (claim and evidence), merge them in some way, and output a single value, representing the probability that the claim is correct. The following simplified schema shows the model architecture:

Two different models are been trained:

  • a base one that use the last state of a LSTM layer for the sentence embedding and combine the two sentences through an Add layer
  • the other one is just an extension using the same configuration of the previous and adding the cosine similarity between claim and evidence to the input of the network.

The trained models are available at the following link.

Results

Two evaluation strategies have been used: multi-input evaluation (standard approach in classification) and claim evaluation (majority voting). As seen in the table below, performances were better for the extended model.

Dataset split Accuracy base model Accuracy extension model
Validation set 0.7609 0.7611
Test set (normal) 0.735 0.743
Test set (majority vote) 0.710 0.718

This is an example of the prediction made by the extended model on a test set of pairs claim-evidence:

CLAIM:  Scream has some level of success.
EVIDENCES:
	1. (SUPPORTS) The first series entry , Scream , was released on December 20 , 1996 and is currently the highest-grossing slasher film in the United States 
	2. (SUPPORTS) It received several awards and award nominations 
	3. (SUPPORTS) The film went on to financial and critical acclaim , earning $ 173 million worldwide , and became the highest-grossing slasher film in the US in unadjusted dollars 
PREDICTION:
	1. SUPPORTED (Confidence 98%)
	2. SUPPORTED (Confidence 99%)
	3. SUPPORTED (Confidence 98%)

Resources & Libraries

  • NLTK
  • Tensorflow + Keras

Versioning

We use Git for versioning.

Group members

Reg No. Name Surname Email Username
1005271 Giuseppe Boezio giuseppe.boezio@studio.unibo.it giuseppeboezio
983806 Simone Montali simone.montali@studio.unibo.it montali
997317 Giuseppe Murro giuseppe.murro@studio.unibo.it gmurro

License

This project is licensed under the MIT License - see the LICENSE file for details