/TinyBERT-Enhanced-Chat-System-for-Mobile-Legends

I developed a TinyBERT powered chat moderation system for Mobile Legends. It classifies in-game comments as positive, neutral, or negative, providing real-time feedback to curb toxic behavior with dynamic responses and efficient sentiment analysis, enhancing the gaming experience.

Primary LanguageJupyter NotebookMIT LicenseMIT

TinyBERT Enhanced Chat System for Mobile Legends

I developed a TinyBERT-powered chat moderation system for Mobile Legends. It classifies in-game comments as positive, neutral, or negative, providing real-time feedback to curb toxic behavior with dynamic responses and efficient sentiment analysis, enhancing the gaming experience

This project is still under development and is expected to be completed more quickly. The approach taken will focus more on improving the dataset to produce better classification.

🎬 Demo

Demo

🔨 Installation

PyPi version

IPyKernel version Pandas version Pytorch version ScikitLearn version

First make sure you have install PyTorch with CUDA support, because it will take forever for training BERT model.

  import torch
  print(torch.cuda.is_available())

If the output is false, then you need to check your driver version before implementing CUDA. example for installing CUDA 11.8

  pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

First create venv for this folder, you can just copy and paste this :

  python --version
  python -m venv nama_venv
  nama_venv\Scripts\activate
  pip install pandas
  pip install scikit-learn
  pip install transformers
  pip install seaborn

I have a problem regarding the time spent on TinyBERT training. with an RTX 2060 Max Q Design GPU it takes around 73 minutes even with CUDA support.

Actually, my plan is to use IndoBERT to make this project. However, due to performance limitations, we had to change plans and use TinyBERT. This is the setup if u want to use IndoBERT

from transformers import BertForSequenceClassification, Trainer, TrainingArguments


model = BertForSequenceClassification.from_pretrained("indobenchmark/indobert-base-p2", num_labels=3)

memaksa penggunaan CPU
training_args = TrainingArguments(
    output_dir='./results', 
    num_train_epochs=3, 
    per_device_train_batch_size=16, 
    per_device_eval_batch_size=64, 
    warmup_steps=500, 
    weight_decay=0.01, 
    logging_dir='./logs', 
    logging_steps=10, 
    evaluation_strategy="epoch", 
)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)


trainer.train()

but in the end I used TinyBERT, so I modified the code to load pre-trained TinyBERT for classification, like this :

from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments


model = AutoModelForSequenceClassification.from_pretrained("huawei-noah/TinyBERT_General_4L_312D", num_labels=3)
tokenizer = AutoTokenizer.from_pretrained("huawei-noah/TinyBERT_General_4L_312D")


train_encodings = tokenizer(train_texts.tolist(), truncation=True, padding=True, max_length=128)
test_encodings = tokenizer(test_texts.tolist(), truncation=True, padding=True, max_length=128)


train_dataset = SentimentDataset(train_encodings, train_labels.tolist())
test_dataset = SentimentDataset(test_encodings, test_labels.tolist())


training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    evaluation_strategy="epoch"
)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)


trainer.train()

this is the performance of my laptop during model training.

{info}

Result

The results I have obtained are quite satisfying with the following outcomes (note that this project is still in development) :

Accuracy

Precision

Recall

F1-score

Features

  • Accurate classification results
  • A strict system that enforces bans when offensive language is detected

Authors

Static Badge

License

GitHub License