/text-mining-and-statistical-analysis-on-web-social-media

This project shows the usage of several API's to extract social media data and using the data to find interesting points that could be useful. This project is built as a fulfillment for my masters degree.

Primary LanguagePython

Text Mining and Statistical Analysis on Web Social Media Platform (Twitter) using Python

This project was one of the requirements within my postgraduate module called Web Social Media Analytics and Visualization. This project are mainly focused on two part which are Part A: Statistical Analysis on a Popular trends on Twitter and Part B: text mining on an event/campaign happening.

IMPORTANT INFO!!!

Bear in mind that this project uses API to get the data. Therefore, if anyone wants to use the "NewsAPI" or the official Twitter API. The user needs to create an account on both API to obtain the API keys. (Don't worry, It's free!)

NewsAPI: https://newsapi.org/

Twitter API: https://developer.twitter.com/en/docs/twitter-api

A full explaination of each part are as of below:

Part A: Statistical Analysis

In the statistical analysis part of the project, identifying the popoular trends in Twitter was performed where the data is being extracting using the official Twitter API. Afterwards, a specific trend called "2022 Spring Statement Tax Plan" issued by the UK Government was chosen to perform an in-depth statistical analysis where questions such as "WHen does the tweet gets popular?", "What are the devices used to tweet?", and "What sources can be trust?" are answered using these statistical analysis.

A graph analysis on Facebook Dataset SNAP by Stanford University website is performed using "GraphX" library in Python to evalute the Centrality Measures and Community analysis.

Part B: Text Mining

In the text mining part of the project, a different API called "TwythonStreamer" is used to fetch real-time tweets regarding a certain topic. The topic chosen in this project was "Elon Musk" as at that time, Elon Musk just bought the social media platform "Twitter". A sentiment analysis was performed to evaluate the public opinion on this matter, word cloud and word frequency was performed as well. Additonally, the "NewsAPI" is used to extract Articles regarding "Elon Musk" where pre-processing and Latent Semantic Indexing (LSI) is done to perform Topic Modelling.

Proceeding with the files

The python file has been labelled in order, and hence for easier readibility please refer to them in order.

Dataset

The dataset for the statistical analysis and text mining is provided within the github project.

But the data for the graph analysis is provided by the SNAP dataset: https://snap.stanford.edu/data/ego-Facebook.html

Misc

This project is coded in Python using the PyCharm IDE.

If anyone wants to use a part of the code. Please reference it. Thanks.