🍎 Apples-to-Apples: Comparing the Performance of Hate Speech Detection Models in Context
Context: Project for CS6471 course at Georgia Tech, Spring 2022.
Authors:
- Seema Baddam
- Richard Huang
- Kai McKeever
Installation phase
Please refer to install.md.
Datasets
Datasets used:
- Offensive Language Identification Dataset
- Implicit Hate Speech Dataset
- Racism is a Virus Dataset
Please refer to datasets.md for more details.
Preprocessing phase
Before attempting the training phase, please use this command to preprocess the data:
### Start preprocessing | Default to all dataset
python -m src.utils.preprocess_utils --dataset_name all
Training phase
Please refer to training.md for more details.
We provide the trained models here. To use them, please put them in the saved-models/
folder.
Cross-domain Evaluation phase
Please refer to evaluation.md for more details.
Interpretation with XAI phase (Word cloud + Distribution plots)
Please refer to interpret.md for more details.