netflix-logo-png-2574.png

Tweet Sentiment Analysis

AlmaBetter Verfied Project - AlmaBetter School

netflix-logo-png-2574.png

We have build system to check the sentiment of the user based on the tweets, it is categorized into positive sentiments or negative sentiment.

💾 Project Files Description

Executable Files:

  • Coronavirus_tweet_sentiment_analysis.ipynb - Includes all functions required for classification operations.

Output:

  • Google Colab - All the outputs are visible in the provided colab notebook.

Data Source:

-----------------------------------------------------

📖 Business Problem And Objective

Sentiment analysis refers to identifying as well as classifying the sentiments that are expressed in the text source. Tweets are often useful in generating a vast amount of sentiment data upon analysis. These data are useful in understanding the opinion of the people about a variety of topics.

Therefore we need to develop an Automated Machine Learning Sentiment Analysis Model in order to compute the customer perception. Due to the presence of non-useful characters (collectively termed as the noise) along with useful data, it becomes difficult to implement models on them.

-----------------------------------------------------

📖 About Dataset

  • Location : Location from where tweets are done
  • Tweet At : Date and time of the tweet
  • Original Tweet : Context of the tweet
  • Label : Sentiment of the tweet
  • -----------------------------------------------------

    📙: Findings and Results

    Original Dataset contains 6 columns and 41157 rows. Location column contains null values. So, we have dropped the null values. And we added a new column "clean_tweets" after cleaning the tweets. After dropping and adding a new column, now we have 7 columns and 32567 rows. In order to analyze the data we required only two columns "OriginalTweet" and "Sentiment". The columns such as "UserName" and "ScreenName" does not give any meaningful insights for our analysis. There are five types of sentiments - Extremely Positive, Positive, Extremely Negative, Negative and Neutral. We have renamed the Extremely Positive and Extremely Negative sentiments to Positive and Negative respectively. And we are left with three types of sentiments - Positive, Negative and Neutral. The pie chart shows the proportion of sentiments. Bar plot for unique values shows us the number of unique values in each column. The graphical representation of top 10 locations shows us that most of the tweets came from London followed by United States.

    • For multiclass classification, the best model for this dataset would be Logistic Regression

    • For binary classification, the best model for this dataset would be Stochastic Gradient Descent.

    -----------------------------------------------------

    📖 About Dataset

  • Location : Location from where tweets are done
  • Tweet At : Date and time of the tweet
  • Original Tweet : Context of the tweet
  • Label : Sentiment of the tweet
  • -----------------------------------------------------

    📋 Execution Instruction

    The order of execution of the colab notebook is as follows:

    1) Coronavirus_tweet_sentiment_analysis.ipynb

    First, click on the open in colab button present on the top center of the notebook.

    2) Kaggle Dataset

    Downlaod the dataset from kaggle through provided link.Then, connect to the runtime and execute the cell to mount the drive or upload the data file to the current runtime.

    3) Cell Path

    Finally, delete the path in the dataset loading cell and replace it with the path of your current data file. Run each cell to see the output below it.

    -----------------------------------------------------

    📜 Credits

    Vivek Pawar | Data Scientist | Machine Learning Engineer

    Contact me for Data Science Project Collaborations

    LinkedIn Badge GitHub Badge Medium Badge Resume Badge

    -----------------------------------------------------