Batalov Artem, a.batalov@innopolis.university, BS20-AI-01
Text Detoxification Task is a process of transforming the text with toxic style into the text with the same meaning but with neutral style. In this assignment, we are given a dataset of pairs of toxic and non-toxic texts. The task is to create a model that would transform toxic text into non-toxic one.
Full task description
TL;DR: I used few-shot Mistral model and fine-tuned T5 model on given dataset. Fine-tuned T5 model showed better results (0.45 score, see Evaluation section in the report) than Mistral model (0.2 score).
- Clone this repository
git clone https://github.com/bart02/text-detoxification.git cd text-detoxification
- Install requirements
pip install -r requirements.txt
- Download and process the data
./src/data/download.sh python src/data/data_preprocessing.py python src/data/to_hf_datasets.py
- Train the model
python src/models/train_t5.py
- Make predictions
python src/models/predict_t5.py