SE464 Machine Learning Project

Hate Speech Labeler

Streamlit app for hate speech detection using a fine-tuned BERT-based model. The model is trained on the Jigsaw Toxic Comment Classification Challenge dataset for multi-label classification.

Code and data is available at this notebook
The app is deployed and can be tested here (also available at this link)
The model is available at hugging face

Local Installation

Clone the repository:

git clone https://github.com/berkaysahiin/SE464.git

Change into the directory:
```
cd SE464
```
Virtual Environments:
```
virtualenv venv
.\venv\Scripts\activate
```

Requirements:

pip install -r requirements.txt
# if fails try before: pip install pipreqs && pipreqs

Run the Streamlit app:
```
streamlit run main.py
```

Model

Data preprocessing involves cleaning text data, tokenization, and formatting for multi-label classification.
The model is trained with TrainingArguments and Trainer from the Transformers library.
Metrics such as F1 score, ROC AUC, and accuracy are used to evaluate the model's performance on the test set.