This project provides a streamlit application to analyze the toxicity of text inputs using a pre-trained BERT model. The application classifies text into multiple categories of toxicity including toxic
, severe_toxic
, obscene
, threat
, insult
, and identity_hate
.
- Text Analysis: Classifies text input into multiple categories of toxicity.
- Interactive Web Interface: Uses Streamlit for a user-friendly interface.
- Model Training: Includes a Jupyter notebook for training the BERT model on a custom dataset.
- Python 3.8 or higher
- Pip package manager
-
Clone the repository:
git clone https://github.com/kvba1/Text-Toxicity-Analysis cd text-toxicity-analysis
-
Install dependencies:
pip install -r requirements.txt
-
Download the pre-trained model:
Place your trained model (
toxic.pt
) in the./model
directory. -
Run the Streamlit app:
streamlit run app.py
- Enter Text: Type or paste the text you want to analyze into the input box.
- Analyze: Click the "Analyze" button to classify the text.
- View Results: The app displays the classification results and maintains a history of analyzed texts.
To train the model, you can use the provided Jupyter notebook train.ipynb
:
-
Open the notebook:
jupyter notebook train.ipynb
-
Follow the instructions: The notebook contains detailed steps for training the BERT model on a toxicity dataset.
app.py
: The main application file for the Streamlit app.train.ipynb
: Jupyter notebook for training the BERT model.requirements.txt
: List of Python dependencies.model/toxic.pt
: Pre-trained model weights (not included, needs to be downloaded separately).README.md
: Project documentation.
I am friendly
| toxic | severe_toxic | obscene | threat | insult | identity_hate |
|-------|--------------|---------|--------|--------|---------------|
| 0 | 0 | 0 | 0 | 0 | 0 |
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License.