Tweet Sentiment Analysis

AlmaBetter Verfied Project - AlmaBetter School

We have build system to check the sentiment of the user based on the tweets, it is categorized into positive sentiments or negative sentiment.

💾 Project Files Description

Executable Files:

Coronavirus_tweet_sentiment_analysis.ipynb - Includes all functions required for classification operations.

Output:

Google Colab - All the outputs are visible in the provided colab notebook.

Data Source:

Dataset - https://www.kaggle.com/datasets/lopezbec/covid19-tweets-dataset

📖 Business Problem And Objective

Sentiment analysis refers to identifying as well as classifying the sentiments that are expressed in the text source. Tweets are often useful in generating a vast amount of sentiment data upon analysis. These data are useful in understanding the opinion of the people about a variety of topics.

Therefore we need to develop an Automated Machine Learning Sentiment Analysis Model in order to compute the customer perception. Due to the presence of non-useful characters (collectively termed as the noise) along with useful data, it becomes difficult to implement models on them.

📖 About Dataset

Location : Location from where tweets are done

Tweet At : Date and time of the tweet

Original Tweet : Context of the tweet

Label : Sentiment of the tweet

📙: Findings and Results

Original Dataset contains 6 columns and 41157 rows. Location column contains null values. So, we have dropped the null values. And we added a new column "clean_tweets" after cleaning the tweets. After dropping and adding a new column, now we have 7 columns and 32567 rows. In order to analyze the data we required only two columns "OriginalTweet" and "Sentiment". The columns such as "UserName" and "ScreenName" does not give any meaningful insights for our analysis. There are five types of sentiments - Extremely Positive, Positive, Extremely Negative, Negative and Neutral. We have renamed the Extremely Positive and Extremely Negative sentiments to Positive and Negative respectively. And we are left with three types of sentiments - Positive, Negative and Neutral. The pie chart shows the proportion of sentiments. Bar plot for unique values shows us the number of unique values in each column. The graphical representation of top 10 locations shows us that most of the tweets came from London followed by United States.

For multiclass classification, the best model for this dataset would be Logistic Regression
For binary classification, the best model for this dataset would be Stochastic Gradient Descent.

📖 About Dataset

Location : Location from where tweets are done

Tweet At : Date and time of the tweet

Original Tweet : Context of the tweet

Label : Sentiment of the tweet

📋 Execution Instruction

The order of execution of the colab notebook is as follows:

1) Coronavirus_tweet_sentiment_analysis.ipynb

First, click on the open in colab button present on the top center of the notebook.

2) Kaggle Dataset

Downlaod the dataset from kaggle through provided link.Then, connect to the runtime and execute the cell to mount the drive or upload the data file to the current runtime.

3) Cell Path

Finally, delete the path in the dataset loading cell and replace it with the path of your current data file. Run each cell to see the output below it.

📜 Credits

Vivek Pawar | Data Scientist | Machine Learning Engineer

Contact me for Data Science Project Collaborations

vivek16pawar/Tweet-Sentiment-Analysis

Tweet Sentiment Analysis

AlmaBetter Verfied Project - AlmaBetter School

💾 Project Files Description

Executable Files:

Output:

Data Source:

📖 Business Problem And Objective

📖 About Dataset

📙: Findings and Results

📖 About Dataset

📋 Execution Instruction

📜 Credits