Streamlit app for hate speech detection using a fine-tuned BERT-based model. The model is trained on the Jigsaw Toxic Comment Classification Challenge dataset for multi-label classification.
-
Code and data is available at this notebook
-
The app is deployed and can be tested here (also available at this link)
-
The model is available at hugging face
-
Clone the repository:
git clone https://github.com/berkaysahiin/SE464.git
-
Change into the directory:
cd SE464
-
Virtual Environments:
virtualenv venv .\venv\Scripts\activate
-
Requirements:
pip install -r requirements.txt # if fails try before: pip install pipreqs && pipreqs
-
Run the Streamlit app:
streamlit run main.py
-
Data preprocessing involves cleaning text data, tokenization, and formatting for multi-label classification.
-
The model is trained with TrainingArguments and Trainer from the Transformers library.
-
Metrics such as F1 score, ROC AUC, and accuracy are used to evaluate the model's performance on the test set.