Live Topic Analysis is a full-stack spark streaming based analysis engine with real-time charting. Keywords statistics are analyzed in real-time over a streaming interface. It provides live visualizations enabling further predictive analysis on the topics.
This network of 3 docker containers streams and processes live tweets to an in-browser chart.
-
Tweet Generator:
python:latest
image from Docker hub and addedrequests
andrequests_oauthlib
to the requirements.txt. This tweet generator- IN: connects to Twitter via streaming HTTP API (Twitter specs.)
- OUT: listens on 9001 so Spark can connect to it (Nectar specs.)
-
Spark: for this docker image, I used the
jupyter/pyspark-notebook
docker image, which comes ready with pyspark.- IN: connects to 9001 to get data from Tweetgen on port 9001 (Nectar specs.)
- OUT: Post data to Web App via HTTP on 9999 (Webapp/Nodeholder specs.)
-
Web App server: This docker image is built from the
python:latest
image from Docker hub and addedFlask
to the requirements.txt.- IN: Listens to posts to API endpoints handled by Flask (Webapp/Nodeholder)
- OUT: Adds points to Chart.js (Chart.js spec, data to go to local storage)
>source lta.sh
lta> lta-help
To run Live Topic Analysis:
lta-dashboard-start # spark posts to this webserver
lta-tweetgen-start # listens for spark connection
lta-spark-start # connects to tweetgen, receives tweet stream
Type lta-<tab> to see possible commands.
Building:
lta> lta-dashboard-make
lta> lta-spark-make
lta> lta-tweetgen-make
Summary of pipeline:
tweetgen-listens-on-9009 |
spark-connects-to-9009-and-9991 |
dashboard-listens-on-9991
Pipeline example with Node Object Model tools:
nom-find "type=twitter where tag=topic-of-interest" |
tweets-to-lines |
lines-to-sentiments |
sentiments-to-avg |
nom-write-to-lake
$LTA_ROOT is a variable pointing to the local directory
containing lta.sh
.
-
docker network create --driver bridge lta-net
-
Run tweet generator:
docker run -it --rm \ --name tweetgen \ -v $lta_root:/home/ds/data \ --network my-net \ tweetgen //bin/bash
- See Architecture below to create a
simple-server
image
-
create directory 'cps' in the same directory as
spark-streaming.py
-
Run WebApp server dashboard by executing
app.py
, with a port exposed (9991 in this example)docker run -it --rm --name app_server -v $lta_root:/home/app -p 9991:9991 --network my-net simple-flask //bin/bash
- See Architecture below to create a
simple-flask
image
-
Run Spark Stream:
docker run -it --rm --name pyspark -v $lta_root:/home/jovyan --network my-net jupyter/pyspark-notebook //bin/bash