/Twitter-US-Airline-Sentiment-Analysis

In this repository I have utilised 6 different NLP Models to predict the sentiments of the user as per the twitter reviews on airline. The dataset is Twitter US Airline Sentiment. The best models each from ML and DL have been deployed. It employs text preprocessing,

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Twitter-US-Airline-Sentiment-Analysis

References

  1. Free courses online.me tutorials on Sentiment Analysis
  2. Analytics India Dimag Tutorial on Sentiment Analysis Using LSTM:-(https://analyticsindiamag.com/how-to-implement-lstm-rnn-network-for-sentiment-analysis/)

About Dataset

In this repository I have utilised 6 different NLP Models to predict the sentiments of the user as per the twitter reviews on airline. The dataset is Twitter US Airline Sentiment. The best models each from ML and DL have been deployed using Flask and Heroku platform. The dataset has been imported from Kaggle with the following link:- https://www.kaggle.com/crowdflower/twitter-airline-sentiment/download

The text preprocessing involved removal of stopwords,HTML Tags,punctuations and lemmatization taking care of POS Tags.I have used six methodologies for this classification.

Sentiment Analysis Using Machine And Deep Learning

1) Without Any Vectorization
Here I have just used a dictionary of most frequent 2500 words. So my training set includes a dictionary of top 2500 words after text preprocessing with true or false as the values of dictionary whether they occured in the sentence or not. Here I have used all 3 labels positive,negative and neutral and plotted a confusion matrix. The accuracy was observed around 78% accuracy with 86-87% of precison and recall.
2) Using Machine Learning Algorithms like Naive Bayes, K Nearest Neighbors and Random Forests
Here I used vectorization techniques such as Bag of Words, TF-IDF and word2Vec to use textual information and utilised the above machine learning algorithms along with hyperparameter tuning for sentiment analysis. The Multinomial Naive Bayes Model acheived 89% accuracy and 0.95 AUC score while the KNN and Random Forest Models acheived accuracies of around 85-87% and AUC scores of 0.92.
Here the best results were attained by Multinomial Naive Bayes. Hence I created its pickle file and deployed on Flask and Heroku. Click the link below and enter the text whose sentiment you wanna know.
https://firstnlpdeployedapp.herokuapp.com/
3) Using Deep Learning
Part A) Here I used Artificial Neural Networks with only Dense Layers and two Dropout Layers. Initially I displayed the effect of regularization and dropout layer on our prediction. After that I did hyperparamter tuning using Keras Tuner and acheived test and validation set accuracies of around 92-93%
Part B) Here I used Embedding Layers and LSTMs alongside Dense and Dropout Layers. I further did hyperparameter tuning for each of the Embedding, LSTM and Dense layers using Keras Tuner and acheived 98% accuracy on test set and around 93% on validation set in just 4 iterations.
Part C) Here I used Embedding Layers and Bidirectional LSTM alongside Dense and Dropout Layers. I further did hyperparameter tuning for each of the Embedding, Bidirectional LSTM and Dense layers using Keras Tuner and acheived 97.23% accuracy on test set and around 94.26% on validation set in just 4 iterations.
Here the best results were attained by LSTM Model. Hence I created its file as lstm_model.h5 and deployed it on Flask. Its size was above 25 MB leading to difficulty of upload. So its jupyter notebook has been added in the folder Deep Learning Models with deployment on flask which downloads lstm_model.h5 in Google Colab.

Screenshots from the LSTM Model which has been deployed using flask.

Untitled1

As expected the sentiment of tweet is positive.

Untitled3

Similarly the Screenshot for another tweet in the the LSTM Model which has been deployed using flask.

Untitled1

As expected the sentiment of tweet is negative.

Untitled9