Live Topic Analysis

Live Topic Analysis is a full-stack spark streaming based analysis engine with real-time charting. Keywords statistics are analyzed in real-time over a streaming interface. It provides live visualizations enabling further predictive analysis on the topics.

Architecture

This network of 3 docker containers streams and processes live tweets to an in-browser chart.

Tweet Generator: python:latest image from Docker hub and added requests and requests_oauthlib to the requirements.txt. This tweet generator
- IN: connects to Twitter via streaming HTTP API (Twitter specs.)
- OUT: listens on 9001 so Spark can connect to it (Nectar specs.)
Spark: for this docker image, I used the jupyter/pyspark-notebook docker image, which comes ready with pyspark.
- IN: connects to 9001 to get data from Tweetgen on port 9001 (Nectar specs.)
- OUT: Post data to Web App via HTTP on 9999 (Webapp/Nodeholder specs.)
Web App server: This docker image is built from the python:latest image from Docker hub and added Flask to the requirements.txt.
- IN: Listens to posts to API endpoints handled by Flask (Webapp/Nodeholder)
- OUT: Adds points to Chart.js (Chart.js spec, data to go to local storage)

Usage

>source lta.sh
lta> lta-help

To run Live Topic Analysis:

lta-dashboard-start  # spark posts to this webserver
lta-tweetgen-start   # listens for spark connection
lta-spark-start      # connects to tweetgen, receives tweet stream

Type lta-<tab> to see possible commands.

Building:

lta> lta-dashboard-make 
lta> lta-spark-make
lta> lta-tweetgen-make

Summary of pipeline:
   tweetgen-listens-on-9009 |
   spark-connects-to-9009-and-9991 |
   dashboard-listens-on-9991
   
   
Pipeline example with Node Object Model tools:

nom-find "type=twitter where tag=topic-of-interest"  |
   tweets-to-lines |
   lines-to-sentiments |
   sentiments-to-avg  |
    nom-write-to-lake

Under the Hood

$LTA_ROOT is a variable pointing to the local directory containing lta.sh.

docker network create --driver bridge lta-net
Run tweet generator:
- docker run -it --rm \ --name tweetgen \ -v $lta_root:/home/ds/data \ --network my-net \ tweetgen //bin/bash
- See Architecture below to create a simple-server image
create directory 'cps' in the same directory as spark-streaming.py
Run WebApp server dashboard by executing app.py, with a port exposed (9991 in this example)
- docker run -it --rm --name app_server -v $lta_root:/home/app -p 9991:9991 --network my-net simple-flask //bin/bash
- See Architecture below to create a simple-flask image
Run Spark Stream:
- docker run -it --rm --name pyspark -v $lta_root:/home/jovyan --network my-net jupyter/pyspark-notebook //bin/bash

study-groups/live-topic-analysis

Live Topic Analysis

Architecture

Usage

Under the Hood