Social_Media_Analytics_using_Spark_and_big_data_tools

Sentiment Analysis model by using PySpark to analyze social media data for gaining insights into user sentiment. We analyze the collected data to find out which tweets are positive and negative.

Data Collection

This is the sentiment140 dataset from Kaggle. It contains 1,600,000 tweets extracted using the Twitter API. The tweets have been annotated (0=negative, 1=positive) and they can be used to detect sentiment. It contains the following 6 fields:

target: the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
ids: The id of the tweet ( 2087)
date: the date of the tweet (Sat May 16 23:58:44 UTC 2009)
flag: The query (lyx). If there is no query, then this value is NO_QUERY.
user: the user that tweeted (robotickilldozr)
text: the text of the tweet (Lyx is cool)

Dataset "sentiment140" from Kaggle: https://www.kaggle.com/datasets/kazanova/sentiment140

AmalMohamed2001/Social_Media_Analytics_using_Spark_and_big_data_tools

Social_Media_Analytics_using_Spark_and_big_data_tools

Data Collection