/tweet-sentiment-analysis

Primary LanguageJupyter NotebookOtherNOASSERTION

Table of Contents

Model Selection for Tweet Sentiment Analysis

output_43_0.png

This Repository Contains

Questions

  • Which model will perform best?
  • How well will a model perform?
  • What can androids or iphones improve in their products to reduce negative feedback?

Using the OSEMN Process

  • Obtain the data
  • Scrub the data
  • Explore the data
  • Model the data
  • Interpret the data
  • Reference

Results

Gradient Boosting Classifier

output_75_1.png

    From the model:
  • Slightly overfitting the training data
  • Odd that the True Label(1) doesn't add up to 100%
  • The best model overall at 79% accuracy
  • Successfully predicting 64% and 82% of Negatives and Positives respectively

Stacking Classifier

output_109_1.png

    From the model:
  • Overfitting the training data at 98%
  • Predicting 56% and 91% of Negatives and Positives Respectively
  • 86% accuracy on the Testing data
  • Still struggling to predict the negative emotion, slightly better than flipping a coin in that aspect

Recommendations

  • Which model will perform best?
  • Gradient Boosting for balanced recall, and stacking for the highest overall accuracy
  • How well will a model perform?
  • Overall Testing accuracy
    • Gradient Boosting: 76%
    • Stacking Classifier: 86%
  • What can androids or iphones improve in their products to reduce negative feedback?
  • Android
    • Contests
    • Events
  • Iphone
    • Interface
    • Ease of use

Next Steps

  • Build an sklearn pipeline with and grid search with the tokenizer and vectorizer parameters along with the parameters for each of the classifiers.
  • Build neural networks for the data, using Oscar for hyperparameter tuning
  • More tuning on the above models

Repository Structure

|   data
\--- tweet.csv
|
|   images
\--- phone.jpg
|
|   presentation.pdf
|   README.md
|   student.ipynb
|   readme.ipynb
|       
\--- styles
|      custom.css

\--- md
|      student.md
|      .png files of all graphs