Twitter Streaming Data with

Overview

Twitter Streaming with Apache Spark, Apache Kafka, Hive, Hbase, SparkQL, and Tableau.

How to run project

Getting Twitter API keys
- Create a twitter account if you do not already have one.
- Go to https://apps.twitter.com/ and log in with your twitter credentials.
- Click "Create New App"
- Fill out the form, agree to the terms, and click "Create your Twitter application"
- In the next page, click on "API keys" tab, and copy your "API key" and "API secret".
- Scroll down and click "Create my access token", and copy your "Access token" and "Access token secret".
Open Terminal and start Kafka server:
cd /opt/kafka_2.13-2.6.2/
bin/kafka-server-start.sh config/server.properties
Execute TweetProducer.java of twitter-kafka project.
Execute JavaSparkApp.java of spark-streaming project.

Start hive and create new table:

CREATE EXTERNAL TABLE tweet_data_table name STRING, country STRING, followers STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ('separatorChar' = ':', 'quoteChar' = '\')
LOCATION '/user/cloudera/Tweets'
TBLPROPERTIES ('hive.input.dir.recursive'='ture', 'hive.supports.subdirectories'='true',
'hive.supports.subdirectories'='true', 'mapreduce.input.fileinputformat.input.dir.recursive'='true');

Create internal table

CREATE TABLE report (name STRING, followers STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ":";

Load data into internal table

LOAD DATA LOCAL INPATH '/home/cloudera/Tweet' OVERWRITE INTO TABLE report;

Troubleshot

Sometime Hbase service dead and you must be restart by commands:

sudo service hbase-master restart;
sudo service hbase-regionserver restart;

Reference

Cloudera https://www.cloudera.com/
Apache Spark-Streaming: https://spark.apache.org/streaming/
Apache SQL: http://spark.apache.org/sql/
Apache Kafka: https://kafka.apache.org/
Apache Hive https://hive.apache.org/
Apache HBase https://hbase.apache.org/
Tableau https://www.tableau.com/

NSNJRGL/CS523

Twitter Streaming Data with

Overview

How to run project

Troubleshot

Reference