yifzhang/pipeline

End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline using Spark, Spark SQL, Spark ML, GraphX, Spark Streaming, Kafka, NiFi, Cassandra, ElasticSearch, Redis, Tachyon, HDFS, Zeppelin, iPython/Jupyter Notebook, Tableau, Twitter Algebird. See https://github.com/fluxcapacitor/pipeline/wiki for Setup Instructions.

ShellNOASSERTION

Docker-based, End-to-End, Big Data Reference Pipeline!

Real-time, Advanced Analytics, Machine Learning, Streaming, Graph Processing, Text/NLP Analytics

Follow Wiki Sidebar to Setup Environment -->

Apache Zeppelin Notebooks

Jupyter/iPython Notebooks

Apache NiFi Flows

Tableau Integration

Beeline Command-line Hive Client

Log Visualization with Kibana & Logstash

Spark, Spark Streaming, and Spark SQL Admin UIs

Ganglia System and JVM Metrics Monitoring UIs

Architecture Overview

Tools Overview