/demo-overview

IBM Cloud Streaming Demo

Apache License 2.0Apache-2.0

Apache2 license


Demo Overview

The IBM Cloud Streaming Retail Demo showcases some data and analytics technologies on the IBM Cloud. Some of the technologies include:

  • IBM Message Hub (Kafka)
  • IBM Analytics Engine (Spark Structured Streaming)
  • IBM Cloud Foundry
  • IBM Compose ScyllaDB (Cassandra)
  • IBM Compose Elasticsearch
  • IBM Cloud Object Storage
  • Machine Learning (Spark ML, Scikit Learn)

Demo Architecture

The demo code is all contained within this GitHub repository's parent GitHub organisation ibm-cloud-streaming-retail-demo. The parent repository contains a number of GitHub repositories all focused on different aspects of the solution. The GitHub repositories are described below:

  • dataset-generator This repository is responsible for generating the main retail dataset for the demo. You should start with this project to generate the dataset that you will need for the other projects.
  • kafka-producer-for-simulated-data This repository is responsible for sending the dataset generated by the dataset-generator project to IBM Message Hub (Kafka)

These two are a work in progress (just need documentation updating) ...

This one is a work in progress (works on standalone spark, but not on IAE) ...

More coming soon ...

Credits

This project is based on this dataset:

Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197–208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).

More information on the dataset can be found in the dataset-generator project.