Tweets Sentiment Analysis
Built a real-time tweets sentiment analysis application using Python, Spark Streaming , Elasticsearch and Kibana
Implemented a scrapper that collects and pre-processes tagged tweets for analytics and served as a Spark Streaming source
Implemented real-time sentiment analysis by using NLTK and Spark Streaming that served as a source for data visualization
Created a visualization panel including data table and heatmaps to show the geolocation distribution of sentiment analysis results by using Elasticsearch and Kibana
Course: Big Data Management and Analytics (CS 6350)
Professor: Latifur Khan
Semester: Spring 2018
Programming Language: Python 3
Install Elasticsearch and Kibana
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get update
sudo apt -y install elasticsearch kibana
Install Elasticsearch-Hadoop
sudo apt install python3-pip
sudo -H pip install --upgrade pip
sudo -H pip3 install --upgrade pip
sudo -H pip3 install tweepy pyspark googlemaps nltk twython elasticsearch numpy
Set Up Twitter API Keys and Tokens
Get Twitter API Keys and tokens from Twitter Apps
export T_ACCESS_TOKEN=<access_token>
export T_ACCESS_SECRET=<access_secret>
export T_CONSUMER_KEY=<consumer_key>
export T_CONSUMER_SECRET=<consumer_secret>
Set UP Google Maps Geolocation API Key
Get Google Maps Geolocation API Key from Google
export G_MAPS_API_KEY=<api_key>
sudo systemctl start elasticsearch
sudo systemctl start kibana
sudo systemctl status elasticsearch
sudo systemctl status kibana
chromium-browser http://localhost:9200 &
chromium-browser http://localhost:5601 &
python3 stream.py <server_port> <hash_tag>
i.e. python3 stream.py 9001 Trump
spark-submit --jars <path/to/elasticsearch_hadoop_jar_file> spark.py <es_domain> <es_port> <server_port> <interval>
i.e. spark-submit --jars ${HOME}/frameworks/elasticsearch-hadoop-6.2.3/dist/elasticsearch-hadoop-6.2.3.jar spark.py localhost 9200 9001 5
Check Documents in the Index (tweets
) and Type (tweet
)
curl -XGET 'localhost:9200/tweets/tweet/_search?pretty'
Delete Index (tweets
)
curl -XDELETE 'localhost:9200/tweets'
Check run_tweets_analysis.sh
in my gist