/pipeline

End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline using Spark, Spark SQL, Spark ML, GraphX, Spark Streaming, Kafka, NiFi, Cassandra, ElasticSearch, Redis, Tachyon, HDFS, Zeppelin, iPython/Jupyter Notebook, Tableau, Twitter Algebird. See https://github.com/fluxcapacitor/pipeline/wiki for Setup Instructions.

Primary LanguageShellOtherNOASSERTION

Docker-based, End-to-End, Big Data Reference Pipeline!

Real-time, Advanced Analytics, Machine Learning, Streaming, Graph Processing, Text/NLP Analytics

Follow Wiki Sidebar to Setup Environment -->

Apache Zeppelin Notebooks

Apache Zeppelin Notebooks

Jupyter/iPython Notebooks

Jupyter/iPython Notebooks

Apache NiFi Flows

Apache NiFi Flows

Tableau Integration

Tableau Integration

Beeline Command-line Hive Client

Beeline Command-line Hive Client

Log Visualization with Kibana & Logstash

Log Visualization with Kibana & Logstash

Spark, Spark Streaming, and Spark SQL Admin UIs

Spark Admin UI Spark Admin UI Spark Admin UI Spark Admin UI Spark Admin UI Spark Admin UI

Ganglia System and JVM Metrics Monitoring UIs

Ganglia Metrics UI Ganglia Metrics UI Ganglia Metrics UI Ganglia Metrics UI Ganglia Metrics UI

Architecture Overview

Big Data Pipeline Overview

Tools Overview

Apache Spark Redis Apache Cassandra Apache Kafka NiFi ElasticSearch Logstash Kibana Apache Zeppelin Ganglia Hadoop HDFS iPython Notebook Docker Tachyon