/tweets_analysis

Tweets Sentiment Analysis with Streaming Process and Visualization

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

Tweets Sentiment Analysis

Summary

  • Built a real-time tweets sentiment analysis application using Python, Spark Streaming, Elasticsearch and Kibana
  • Implemented a scrapper that collects and pre-processes tagged tweets for analytics and served as a Spark Streaming source
  • Implemented real-time sentiment analysis by using NLTK and Spark Streaming that served as a source for data visualization
  • Created a visualization panel including data table and heatmaps to show the geolocation distribution of sentiment analysis results by using Elasticsearch and Kibana

Project Information

  • Course: Big Data Management and Analytics (CS 6350)
  • Professor: Latifur Khan
  • Semester: Spring 2018
  • Programming Language: Python 3

Check Preview

Preview

Setup on Ubuntu

Install Elasticsearch and Kibana

  • echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
  • wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
  • sudo apt-get update
  • sudo apt -y install elasticsearch kibana

Install Elasticsearch-Hadoop

Install Python Packages

  • sudo apt install python3-pip
  • sudo -H pip install --upgrade pip
  • sudo -H pip3 install --upgrade pip
  • sudo -H pip3 install tweepy pyspark googlemaps nltk twython elasticsearch numpy

Set Up Twitter API Keys and Tokens

  • Get Twitter API Keys and tokens from Twitter Apps
  • export T_ACCESS_TOKEN=<access_token>
  • export T_ACCESS_SECRET=<access_secret>
  • export T_CONSUMER_KEY=<consumer_key>
  • export T_CONSUMER_SECRET=<consumer_secret>

Set UP Google Maps Geolocation API Key

  • Get Google Maps Geolocation API Key from Google
  • export G_MAPS_API_KEY=<api_key>

Start Services

  • sudo systemctl start elasticsearch
  • sudo systemctl start kibana

Check Services

  • sudo systemctl status elasticsearch
  • sudo systemctl status kibana

Monitor Elasticsearch

  • chromium-browser http://localhost:9200 &

Monitor kibana

  • chromium-browser http://localhost:5601 &

Run Server

  • python3 stream.py <server_port> <hash_tag>
  • i.e. python3 stream.py 9001 Trump

Run Client

  • spark-submit --jars <path/to/elasticsearch_hadoop_jar_file> spark.py <es_domain> <es_port> <server_port> <interval>
  • i.e. spark-submit --jars ${HOME}/frameworks/elasticsearch-hadoop-6.2.3/dist/elasticsearch-hadoop-6.2.3.jar spark.py localhost 9200 9001 5

Check Documents in the Index (tweets) and Type (tweet)

  • curl -XGET 'localhost:9200/tweets/tweet/_search?pretty'

Delete Index (tweets)

  • curl -XDELETE 'localhost:9200/tweets'

Run Script

  • Check run_tweets_analysis.sh in my gist

Reference

NLTK

Google Maps API

Tweepy

Spark

Elasticsearch

Kibana