Hate Speech Detection

The purpose of this project was to build an Hate Speech Detector that given a tweet is able to recognize, if it contains hateful content or not. The dataset used to build the model was obtained from Evalita-2020, a challenge to evaluate the NLP and Speech tools for the Italian language, more in details on the haspeede2 task. More information about the project can be found in the report.

Models

We analyzed and trained three machine learning models, the first one is the bidirectional LSTM, the second one is the Kim Convolutional neural network and the last one based on BERT model is AlBERTo. In order to evaluate the machine learning models in a better way we perform a k-Fold Cross-Validation (where k equal to five).

Running the project

All the models have a notebook that can be runned on Google Colab, to speed up the computation required for the training and inference phase it is suggested to change the runtime type to GPU.

Results

The learning and the f1 score plots obtained during the training of the models are shown below (in order CNN, BILSTM and AlBERTo).

Acknowledgments

This project was developed for the course of Human Language Technologies at the University of Pisa under the guide of Prof. Giuseppe Attardi.

Authors