hate-speech-detection-social-media
This repository contains the thesis titled "Hate Speech Detection in Social Media". Also, the github links containing the code for the experiments.
Thesis Abstract
Social Media platforms are often abused to spread hateful messages. These not only cause harm to
the individual but also to society in general. The staggering volume of content generated in
social media across so many countries, regions and languages make it impossible to be moderated
manually. This necessitates that moderation efforts be augmented with automated tools. To this
end, the thesis aims to aid this effort by developing automated hate speech detection tools.
Owing to the recent successof deep learning across multiple domains, the thesis develops multiple
deep learning models for detecting hate speech in social media. The thesis develops various such
models using DNN architectures like CNN, BiLSTM to the more recent BERT-based state-of-the-art
pre-trained models. These models are evaluated using datasets not only in English but also in low
resource languages such as Indian Bengali, Hindi and their code-mixed variants. These datasets are
collected from various sources like Facebook, Twitter and YouTube. In addition, the thesis also
studies the detection of online aggression and hate speech identification in internet memes.
Code Links:
- Hate Speech Detection in Indo-European Languages : https://github.com/cozek/hasoc-2019-falsepostive
- Checkpoint Ensemble of Transformers for Hate Speech Detection : https://github.com/cozek/OffensEval2020-code
- Automated Aggression Identification using Transformers : https://github.com/cozek/trac2020_submission
- Using Text and Image Features to Classify Internet Memes : https://github.com/cozek/memotion2020-code