This repo contains documentation, source code, workshop contents, howtos… for different base techniques, aka "patterns", that can used in Data Science, Data Engineering, AI/ML, Analytics…
The goal is to provide concrete elements that people will be able to learn from, reproduce, adapt and modify to fit their own needs and requirements.
The library currently features 15 different patterns, as well as 2 demos that are full implementation of scenarios where several patterns are combined.
In this folder, you will find the following patterns
-
Kafka Producer: Produce kafka events from your application code
-
Kafka Consumer: Consume Kafka events from your application code
-
Kafka Edge-to-Core: Store-and-forward kafka events from edge to core
-
Kafka to Serverless: Use Kafka as an event source to trigger OpenShift Serverless functions
-
Bucket notification: Send events to various endpoints when something occurs on your Ceph-based object storage
In this demo, we show how to implement an automated data pipeline for chest Xray analysis:
-
Ingest chest Xrays into an object store based on Ceph.
-
The Object store sends notifications to a Kafka topic.
-
A KNative Eventing Listener to the topic triggers a KNative Serving function.
-
An ML-trained model running in a container makes a risk of Pneumonia assessment for incoming images.
-
A Grafana dashboard displays the pipeline in real time, along with images incoming, processed and anonymized, as well as full metrics.
In this demo, we show how to implement this scenario:
-
Using a trained ML model, licence plates are recognized at toll location.
-
Data (plate number, location, timestamp) is send from toll locations (edge) to the core using Kafka mirroring to handle communication issues and recovery.
-
Incoming data is screened real-time to trigger alerts for wanted vehicles (Amber Alert).
-
Data is aggregated and stored into object storage.
-
A central database contains other information coming from licence registry system: car model, color,…
-
Data analysis leveraging Presto and Superset is done against stored data (database and object storage).
In this demo, we show how to implement industrial condition based monitoring using hybrid cloud ML approach. Machine inference-based anomaly detection on metric time-series sensor data at the edge, with a central data lake and ML model retraining. It also shows how hybrid deployments (cluster at the edge and in the cloud) can be managed, and how the CI/CD pipelines and Model Training/Execution flows can be implemented.
We recommend to use a virtual python environment to develop for this repository. To create such an environment use
python3 -m venv .venv
source .venv/bin/activate
Then install the dependencies with:
pip install -r requirements.txt
Note
|
This file is generated with pip3 freeze > requirements.txt
|
Then install the pre-commit hook with
pre-commit install
Now you’re ready to contribute code.