/datahub-dine

Data hub INteractive Education (DINE) is a demo content that shows how to consume the features of SAP Data Hub.

Primary LanguagePythonApache License 2.0Apache-2.0

Important Notice

This public repository is read-only and no longer maintained.

DataHub Interactive Education (DINE)

REUSE status

Overview

Data Hub INteractive Education(DINE) is an educational content for SAP Data Hub. Our hands-on exercises are developed to show you how to use SAP Data Hub features. SAP Data Hub allows you to connect to different data sources such as SAP HANA, SAP ERP, SAP BW, Oracle DB2, SQL Server, and many more and can process various data types; structured, semi-structured and unstructured using Kafka, streaming engine, text and image analysis, etc. SAP Data Hub can bring all your data together so you can work across them seamlessly. You can quickly develop your prototype on SAP Data Hub and the result can be easily turned to a production level system since SAP Data Hub takes care of execution, orchestration, scheduling, and monitoring. SAP Data Hub is developed on Kubernetes and therefore it is deployable on premise or in the cloud. It runs on a distributed execution engine and is designed for Big Data world by proving understanding on metadata in a Big Data landscape.

Also go through the official documentation of SAP Data Hub

DINE makes it easy to learn how to build pipelines in SAP Data Hub using its operators . It acts as reference for application developers and showcases the features of Data Hub in an easy to understand business scenario. This demo content comes complete with:

  • Sample data
  • Code snippets
  • Tutorials

Prerequisites

SAP Data Hub Setup - Follow the Installation Guide for SAP Data Hub and setup your SAP Data Hub environment.

You can also use SAP Data Hub Developer Edition or SAP Data Hub Trial Edition

Scenarios

Alt text

We will learn SAP Data Hub through the below scenarios which are based on dummy entity called as SAP Data Hub Market Place , an e-commerce platform which is developed for the purpose of demo and learning, where customers across the globe make thousands of purchases everyday.

The scenarios are detailed below:

  • Customer Return Prediction : This scenario is used to identify the products which can frequently be returned by the customer based on different parameter. This scenario is implemented is Python and uses sklearn library to implement decision tree classifier algorithm. Here in this scenario we are reading data from different data sources and using SAP Analytics cloud to visualize the result dataset. Follow the tutorial to implement this scenario.

More scenarios can be found in the teched-2018 branch.

Datasets

Our dataset for the above scenarios comprise of 6 files, which contain customers, products and sales information.

  • CUSTOMER table has details of customers , this table has ADDRESSID which is mapped to ADDRESS table where details of customers address are stored.

  • When a Customer buys a Product, Sales Order is generated (SO_HEADER) and each sales order has multiple order items (SO_ITEM).

  • SO_HEADER has PARTNERID , a foreign key which links to CUSTOMER table.

  • SO_ITEM has SALESORDERID, a foreign key which links to SO_HEADER.

  • Each SO_ITEM will have PRODUCTID which is mapped to PRODUCT table where details of products are stored.

  • Customer Reviews about the products are stored in REVIEW table.

  • Information about returns made by customers are stored in RETURN table.

  • So basically we have 7 tables.

It is sythetic dataset derived from SHINE and is enriched to suit our usecases

ER Diagram

Alt text

To access the datasets, explore the data folder in this repository.

Known issues

None

Support

Please use GitHub issues for any bugs to be reported.

License

Copyright (c) 2017-2020 SAP SE or an SAP affiliate company. All rights reserved. This project is licensed under the Apache Software License, version 2.0 except as noted otherwise in the LICENSE file.