Demo Overview

The IBM Cloud Streaming Retail Demo showcases some data and analytics technologies on the IBM Cloud. Some of the technologies include:

IBM Message Hub (Kafka)
IBM Analytics Engine (Spark Structured Streaming)
IBM Cloud Foundry
IBM Compose ScyllaDB (Cassandra)
IBM Compose Elasticsearch
IBM Cloud Object Storage
Machine Learning (Spark ML, Scikit Learn)

The demo code is all contained within this GitHub repository's parent GitHub organisation ibm-cloud-streaming-retail-demo. The parent repository contains a number of GitHub repositories all focused on different aspects of the solution. The GitHub repositories are described below:

dataset-generator This repository is responsible for generating the main retail dataset for the demo. You should start with this project to generate the dataset that you will need for the other projects.
kafka-producer-for-simulated-data This repository is responsible for sending the dataset generated by the dataset-generator project to IBM Message Hub (Kafka)

These two are a work in progress (just need documentation updating) ...

spark-structured-streaming-on-iae-to-cos save the kafka data stream to IBM Cloud Object Storage (COS) using Apache Spark on IBM Analytics Engine
spark-structured-streaming-on-iae-to-elasticsearch save the kafka data stream to IBM Compose Elasticsearch using Apache Spark on IBM Analytics Engine

This one is a work in progress (works on standalone spark, but not on IAE) ...

spark-structured-streaming-on-iae-to-scylladb save the kafka data stream to IBM Compose ScyllaDB using Apache Spark on IBM Analytics Engine

More coming soon ...

IBM Cloud SQL Query periodically convert json in landing zone from spark-structured-streaming-on-iae-to-cos to partitioned parquet/ORC to support hive queries
Looker report on hive data populated by IBM Cloud SQL Query or directly in landing zone
Cognos report on hive data populated by IBM Cloud SQL Query or directly in landing zone
spark-structured-streaming-on-iae-to-hbase https://stackoverflow.com/a/49450254/1033422
spark-structured-streaming-on-iae-to-phoenix jdbc sink? https://stackoverflow.com/q/45373795/1033422
Realtime reporting dashboard using data in hive
Compose PostgreSQL sink https://stackoverflow.com/q/45373795/1033422
spark structured streaming + hive streaming https://github.com/jerryshao/spark-hive-streaming-sink

Credits

This project is based on this dataset:

Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197â€“208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).

More information on the dataset can be found in the dataset-generator project.

ibm-cloud-streaming-retail-demo/demo-overview

Demo Overview

Credits