Author: Igor Alentev
Group: BS20-RO-01
Email: i.alentev@innopolis.university
Can be found here
Instead of requirenments.txt
this repository uses conda
environment. Read further.
I am proud owner of the AMD Graphics powered laptop (god bless apple), as a result it is nearly impossible for me to run or test anything locally. In general everything should be fine, but I was unable to test if everything runs as expected. So several issues might be possible. But I did my best to avoid any inconsistencies across the code.
- All predictions preprocessed and saved locally
- All metrics precalculated and saved locally
- All datasets precomputed and saved here and for toxic words here
- Colab notebooks rewritten locally
- Dotenv tuned properly
- Dependencies across the
src
files as well as notebooks should work - Checkpoints provided
-
conda
environment exported to environment.yml
For instance, I would recommend not running tuning and learning, rather than loading the checkpoints, which is indeed works (afaik).
It was a hard decision, but I have decided to store model checkpoints along the project itself. So if you will clone the repo, you will have to clone 0.5GB of checkpoints as well. However, it is very handy, since they are not so heavy, but useful all over the work.
-
Data
-
Metrics
-
Baseline model
-
Models
-
Models Evaluation
-
Results Exploration
Main hypothesis, ideas and related information. The draft of the project
Final report, containing all the necessary information about the models, data retrieval and preprocessing, fine-tuning and evaluation
Please do not blame me if anything does not work. I did my best to seemlessly integrate everything with each other and spent many hours on this. I am aiming at flipped class, so I will be very sad if I will get bad mark because of some minor issue. Even though I am all in for fair assessment and open for discussion of real issues with the work.
- Vladimir Ivanov for informative lectures
- Maxim Evgrafov and Lada Morozova for incredibly useful labs
- Skolkovo for work on detoxification
- Skolkovo for another work on detoxification
- This work for great showcase of metrics and transformers
- ParaNMT-50M dataset creators
- Detoxify creators
- WordNet creators
- Yeah Yeah Yeahs for a great music