/jumpstart-library

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Jumpstart Library

This repo contains documentation, source code, workshop contents, howtos…​ for different base techniques, aka "patterns", that can used in Data Science, Data Engineering, AI/ML, Analytics…​

The goal is to provide concrete elements that people will be able to learn from, reproduce, adapt and modify to fit their own needs and requirements.

The library currently features 15 different patterns, as well as 2 demos that are full implementation of scenarios where several patterns are combined.

Patterns

In this folder, you will find the following patterns

  • Kafka Producer: Produce kafka events from your application code

  • Kafka Consumer: Consume Kafka events from your application code

  • Kafka Edge-to-Core: Store-and-forward kafka events from edge to core

  • Kafka to Serverless: Use Kafka as an event source to trigger OpenShift Serverless functions

  • Bucket notification: Send events to various endpoints when something occurs on your Ceph-based object storage

Demo 1 - XRay analysis automated pipeline

In this demo, we show how to implement an automated data pipeline for chest Xray analysis:

  • Ingest chest Xrays into an object store based on Ceph.

  • The Object store sends notifications to a Kafka topic.

  • A KNative Eventing Listener to the topic triggers a KNative Serving function.

  • An ML-trained model running in a container makes a risk of Pneumonia assessment for incoming images.

  • A Grafana dashboard displays the pipeline in real time, along with images incoming, processed and anonymized, as well as full metrics.

Demo 2 - Smart City scenario

In this demo, we show how to implement this scenario:

  • Using a trained ML model, licence plates are recognized at toll location.

  • Data (plate number, location, timestamp) is send from toll locations (edge) to the core using Kafka mirroring to handle communication issues and recovery.

  • Incoming data is screened real-time to trigger alerts for wanted vehicles (Amber Alert).

  • Data is aggregated and stored into object storage.

  • A central database contains other information coming from licence registry system: car model, color,…​

  • Data analysis leveraging Presto and Superset is done against stored data (database and object storage).

Demo 3 - Industrial Condition Based Monitoring

In this demo, we show how to implement industrial condition based monitoring using hybrid cloud ML approach. Machine inference-based anomaly detection on metric time-series sensor data at the edge, with a central data lake and ML model retraining. It also shows how hybrid deployments (cluster at the edge and in the cloud) can be managed, and how the CI/CD pipelines and Model Training/Execution flows can be implemented.

Contributing

We recommend to use a virtual python environment to develop for this repository. To create such an environment use

python3 -m venv .venv
source .venv/bin/activate

Then install the dependencies with:

pip install -r requirements.txt
Note
This file is generated with pip3 freeze > requirements.txt

Then install the pre-commit hook with

pre-commit install

Now you’re ready to contribute code.