/definition-of-message-toxicity

Model that determines the level of toxicity of Russian and English messages

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Machine learning model for toxicity determination

Code Size

English | Русский | Español

This model is designed to determine the level of toxicity of sentences in Russian and English.

Description of files and folders

File or folder name Contents of a file or folder
EnglishToxicModel Folder with code for creating a model for learning in English words
EnglishToxicModel/EnglishModel.bf Final Model for English
EnglishToxicModel/EnglishVectorizer.bf Final vectorizer for English
EnglishToxicModel/EnglishToxicModel.ipynb Model training notebook
EnglishToxicModel/labeledEN.csv Data for training
RussianToxicModel Folder with code for creating a model for learning in Russian words
RussianToxicModel/RussianModel.bf Final Model for Russian
RussianToxicModel/RussianVectorizer.bf Final vectorizer for Russian
RussianToxicModel/RussianToxicModel.ipynb Model training notebook
RussianToxicModel/labeledEN.csv Data for training
ModelLibrary Folder with ready model code
ModelLibrary/models Folder with models and vectorizers
ModelLibrary/models/EnglishModel.bf English model
ModelLibrary/models/RussianModel.bf Russian model
ModelLibrary/models/EnglishVectorizer.bf English vectorizer
ModelLibrary/models/RussianVectorizer.bf Russian vectorizer
ModelLibrary/predict.py Toxicity predictor code
requirements.txt Libraries file

The actual use of the program

import pickle
from ModelLibrary.predict import get_toxicity

with open("ModelLibrary/models/EnglishModel.bf", "rb") as EnglishModel,
        open("ModelLibrary/models/RussianModel.bf", "rb") as RussianModel:
    models_ = [pickle.load(RussianModel), pickle.load(EnglishModel)]

with open("ModelLibrary/models/RussianVectorizer.bf", "rb") as RussianVectorizer,
        open("ModelLibrary/models/EnglishVectorizer.bf", "rb") as EnglishVectorizer:
    vectorizers_ = [pickle.load(RussianVectorizer), pickle.load(EnglishVectorizer)]

print(get_toxicity("ПРИВЕТ КАК ДЕЛА&", models=models_, vectorizers=vectorizers_))

An easier way to use the program

I wrote and published the code for the PyPi module

Installation

pip install toxicityclassifier

PyPi | Source | Releases

Usage example

from toxicityclassifier import *

classifier = ToxicityClassificator()

print(classifier.predict(text))          # (0 or 1, probability)
print(classifier.get_probability(text))  # probability
print(classifier.classify(text))         # 0 or 1

Weights

Weight for classification (if probability >= weight => 1 else 0)

classifier.weight = 0.5


Weight for language detection (English or Russian)

if the percentage of the Russian language >= language_weight, then the Russian model is used, otherwise the English one

classifier.language_weight = 0.5