/cs6471-project

Primary LanguageJupyter NotebookMIT LicenseMIT

🍎 Apples-to-Apples: Comparing the Performance of Hate Speech Detection Models in Context

Context: Project for CS6471 course at Georgia Tech, Spring 2022.

Authors:

  • Seema Baddam
  • Richard Huang
  • Kai McKeever

Installation phase

Please refer to install.md.

Datasets

Datasets used:

  • Offensive Language Identification Dataset
  • Implicit Hate Speech Dataset
  • Racism is a Virus Dataset

Please refer to datasets.md for more details.

Preprocessing phase

Before attempting the training phase, please use this command to preprocess the data:

### Start preprocessing | Default to all dataset
python -m src.utils.preprocess_utils --dataset_name all

Training phase

Please refer to training.md for more details.

We provide the trained models here. To use them, please put them in the saved-models/ folder.

Cross-domain Evaluation phase

Please refer to evaluation.md for more details.

Interpretation with XAI phase (Word cloud + Distribution plots)

⚠️ DISCLAIMER: This part of the study contains words or language that are considered profane, vulgar, or offensive by some readers. ⚠️

Please refer to interpret.md for more details.