/Offensive-Language-Detection-System

A real-time, multi-modal detection system capable of identifying both text and emoji data for early detection of high-risk discussions on social media and messaging applications.

Primary LanguageJupyter Notebook

Offensive-Language-Detection-System

Table of content

Overview

Cyberbullying has become a severe issue, facilitated by the growth of social media. It can have devastating psychological impacts on victims. This project involved building a multi-modal machine learning system to detect offensive language in social media posts and chat messages, which are commonly used in the case of cyberbullying. The system utilizes natural language processing and deep learning for real-time analysis.

Highlight

  • Preprocessed data including emoji handling😆, their conversion into textual form, cleaning, EDA, and data augmentation to address class imbalance
  • Morphological Analysis
  • Sentiment Analysis
  • Stratified Cross Validation
  • Optuna Hyperparemeter Tunning
  • Compared SVM, LSTM, and BERT models
  • BERT achieved the best accuracy of 94% on the test set
  • Integrated model into chat interface with real-time analytics

Dataset

  • Collected a diverse dataset of 10,000 tweets that encompass emojis 😃 and across various topics. Data was sourced from Twitter and Kaggle

Evaluation

  • Accuracy, precision, recall, F1 score, ROC AUC
  • Confusion matrix, precision-recall curve
  • Learning curves

Picture3

View Jupyter Note Here

Deployment

  • Flask backend framework
  • SocketIO enabled real-time analysis
  • Proactive intervention enabled by real-time analysis to foster positive interactions through increased awareness and deterrence of negative content
  • Chat interface with sentiment chart, timing data, and warnings
  • Chat interface showcases real-world applicability.

Conclusion

The system successfully detects cyberbullying instances in text and emojis😃 using deep learning.