Follow Wiki to Setup Docker-based Environment

End-to-End, Real-time ML Reference Data Pipeline

Architecture Overview

Follow Wiki to Setup Docker-based Environment

Mapped to Code

Powered by the PANCAKE STACK!

Upcoming Workshops

Title

Building an End-to-End Streaming Analytics and Recommendations Pipeline with Spark, Kafka, and TensorFlow

Agenda (Full Day)

Part 1 (Analytics and Visualizations)

Analytics and Visualizations Overview (Live Demo!)
Verify Environment Setup (Docker, Cloud Instance)
Notebooks (Zeppelin, Jupyter/iPython)
Interactive Data Analytics (Spark SQL, Hive, Presto)
Graph Analytics (Spark, Elastic, NetworkX, TitanDB)
Time-series Analytics (Spark, Cassandra)
Visualizations (Kibana, Matplotlib, D3)
Approximate Queries (Spark SQL, Redis, Algebird)
Workflow Management (Airflow)

Part 2 (Streaming and Recommendations)

Streaming and Recommendations (Live Demo!)
Streaming (NiFi, Kafka, Spark Streaming, Flink)
Cluster-based Recommendation (Spark ML, Scikit-Learn)
Graph-based Recommendation (Spark ML, Spark Graph)
Collaborative-based Recommendation (Spark ML)
NLP-based Recommendation (CoreNLP, NLTK)
Geo-based Recommendation (ElasticSearch)
Hybrid On-Premise+Cloud Auto-scale Deploy (Docker)
Save Workshop Environment for Your Use Cases

Locations and Dates

San Francisco: Saturday, April 23rd (SOLD OUT)
San Francisco: Saturday, June 4th (SOLD OUT)
Washington DC: Saturday, June 18th (SOLD OUT)
Los Angeles: Sunday, July 10th (SOLD OUT)
Seattle: Saturday, July 30th (SOLD OUT)
Santa Clara: Saturday, August 6th (SOLD OUT)
Chicago: Saturday, August 27th (SOLD OUT)
Atlanta: Sunday, September 25th
New York: Saturday, October 1st
Munich: Saturday, October 15th
London: Saturday, October 22nd
Brussels: Saturday, October 29th
Madrid: Saturday, November 19th
Tokyo: December 3rd
Shanghai: December 10th
Beijing: Saturday, December 17th
Hyderabad: Saturday, December 24th
Bangalore: Saturday, December 31st
Sydney: Saturday, January 7th, 2017
Melbourne: Saturday, January 14th, 2017
Sao Paulo: Saturday, February 11th, 2017
Rio de Janeiro: Saturday, February 18th, 2017

Suggest a City and Date

Description

The goal of this workshop is to build an end-to-end, streaming data analytics and recommendations pipeline on your local machine using Docker and the latest streaming analytics

First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, Kafka, Zeppelin, iPython, and ElasticSearch.
Next, we extend our pipeline to use streaming data to generate personalized recommendation models using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.
Last, we productionize our pipeline and serve live recommendations to our users!