lsmgeb89/tweets_analysis

Tweets Sentiment Analysis with Streaming Process and Visualization

PythonAGPL-3.0

Tweets Sentiment Analysis

Summary

Built a real-time tweets sentiment analysis application using Python, Spark Streaming, Elasticsearch and Kibana
Implemented a scrapper that collects and pre-processes tagged tweets for analytics and served as a Spark Streaming source
Implemented real-time sentiment analysis by using NLTK and Spark Streaming that served as a source for data visualization
Created a visualization panel including data table and heatmaps to show the geolocation distribution of sentiment analysis results by using Elasticsearch and Kibana

Project Information

Course: Big Data Management and Analytics (CS 6350)
Professor: Latifur Khan
Semester: Spring 2018
Programming Language: Python 3

Check Preview

Setup on Ubuntu

Install Elasticsearch and Kibana

echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get update
sudo apt -y install elasticsearch kibana

Install Elasticsearch-Hadoop

Download and install ES-Hadoop

Install Python Packages

sudo apt install python3-pip
sudo -H pip install --upgrade pip
sudo -H pip3 install --upgrade pip
sudo -H pip3 install tweepy pyspark googlemaps nltk twython elasticsearch numpy

Set Up Twitter API Keys and Tokens

Get Twitter API Keys and tokens from Twitter Apps
export T_ACCESS_TOKEN=<access_token>
export T_ACCESS_SECRET=<access_secret>
export T_CONSUMER_KEY=<consumer_key>
export T_CONSUMER_SECRET=<consumer_secret>

Set UP Google Maps Geolocation API Key

Get Google Maps Geolocation API Key from Google
export G_MAPS_API_KEY=<api_key>

Start Services

sudo systemctl start elasticsearch
sudo systemctl start kibana

Check Services

sudo systemctl status elasticsearch
sudo systemctl status kibana

Monitor Elasticsearch

chromium-browser http://localhost:9200 &

Monitor kibana

chromium-browser http://localhost:5601 &

Run Server

python3 stream.py <server_port> <hash_tag>
i.e. python3 stream.py 9001 Trump

Run Client

spark-submit --jars <path/to/elasticsearch_hadoop_jar_file> spark.py <es_domain> <es_port> <server_port> <interval>
i.e. spark-submit --jars ${HOME}/frameworks/elasticsearch-hadoop-6.2.3/dist/elasticsearch-hadoop-6.2.3.jar spark.py localhost 9200 9001 5

Check Documents in the Index (`tweets`) and Type (`tweet`)

curl -XGET 'localhost:9200/tweets/tweet/_search?pretty'

Delete Index (`tweets`)

curl -XDELETE 'localhost:9200/tweets'

Run Script

Check run_tweets_analysis.sh in my gist

Reference

NLTK

Sentiment Analysis

Google Maps API

Geocoding API

Tweepy

Spark

Elasticsearch

Kibana

Kibana Document