/Sentiment-analysis-on-Covid-19-tweets

Sentiment analysis of Covid-19 tweets using XGBoost, LSTM and BERT

Primary LanguageJupyter NotebookMIT LicenseMIT

Sentiment-analysis-on-Covid-19-tweets

Objective:

Classify 45k tweets on Covid-19 as positive or negative based on the following machine learning and deep learning models:

  • Multinomial Naive Bayes Model
  • Random Forests
  • ADABoost
  • XGBoost
  • Simple RNN
  • LSTM
  • GRU
  • Bidirectional LSTM
  • BERT

For machine learning models, the tweets are preprocessed using the following NLP methods:

  • Bag-of-words model
  • Bag-of-POS model
  • Pre-trained Spacy word embeddings

For neural networks, we use the following preprocessing methods:

  • Pre-trained Spacy word embeddings
  • Keras embedding layers

Results:

  • Among machine learning models, XGBoost trained on a bag-of-words model has the best performance in terms of accuracy (82%) and AUC ROC (90%)
  • Among all models, BERT has the best performance (accuracy = 94%)

References:

Data source:

https://www.kaggle.com/datatattle/covid-19-nlp-text-classification