This document is in pre-alpha stage. Any part of the documentation may be changed without advanced notice.
iDaaS, or Incremental Data as a Service, is an open source data platform that integrates and replicates enterprise data silos in real time, provides a complete and unified data layer to serve the operational applications or analytical systems.
In short, iDaaS is aimed to serve operational/transactional workload while data lake/data warehouses are exclusivelly designed to serve analytical workload.
The name "Incremental" is inspired Delta Lake. However, unlike the data lake or data warehouses, which typically loads the data in batch mode, data in iDaaS is incrementally inserted/updated/deleted on a row by row level, mirroring the changes in source systems as those changes occur. The freshness of data, and the correctness of the data guaranteed by the industry first Incremental Engine, enables users to build mission critical operational applications, including web, mobile and backend applications.
Read more on Architecture & Fundamentals page
iDaaS can be used in two main scenarios:
- Scenario A: As a real time data integration platform, connecting data sources to targets, building pipelines
- Scenario B: As a centralized data platform, similar to a data lake but with operational/transactional capability, creating data models and APIs to be consumed by downstream applications
Create a data pipeline to replicate the Customer table in mysql_crm_db to mysql_analytical_db and name it as "CRM_Customer". Creates the table automatically in target database if not exists.
Assuming the database connections were already configured, you only need to write statements like below in iDaaS DDK/Shell:
> createPipeline({alias: "my_pipeline"})
.readFrom( mysql_crm_db.Customer)
.writeTo(mysql_analytical_db.CRM_Customer, {AutoCreate:true} )
.start();
> my_pipline.status()
Status: running
Input Total: 2400
Output Total: 2300
Throughput: 500 events/second
Last Input: 2022.02.01 15:00:03.203
Last Output: 2022.02.01 15:00:03.829
We would like to build a centralized data platform to hold a copy of the master data that are currently scattered in data silos. We would like to use this data platform to serve many of the data requirements requested by different BU or application team.
> createModel({ db: "daas_db",name: "OmniCustomer" })
.readFrom( mysql_insurance_db.Customer)
.startSync();
iDaaS will perform an initial load of the whole Customer table to MongoDB, then enter into real time sync mode.
> createREST({ group: "crm_api",
name: "OmniCustomer" ,
method: "GET",
path: "/OmniCustomer",
allowedParameters: ["type", "gender", "zipcode"],
model: daas_db.OmniCustomer
}).publish()
You can verify the published API using curl:
# curl -H "auth:xxxx" http://daas_server:3030/daas/crm_api/OmniCustomer?gender=M
> daas_db.OmniCustomer.status()
iModel :
db: daas_db
name: OmniCustomer
count: 1000
last update: 2022.02.01 12:00:02.039UTC
replication delay: 1000ms
Sync source: mysql_insurance_db.Customer
state: running
count: 1002
last log entry: 2022.02.01 12:00:01.039UTC
replication delay: 1000 ms
count diff: -2
Interested yet? Follow this document to Get Started
When iDaaS can be used?
- Real time heterogenous database replication
- Build a real time Data as a Service platform
- Data processing / data serving store for BI/Reporting application
- Mainframe offloading
- Implement CQRS pattern
- Build a read caching layer in front of RDBMS
- Real Time / Operational Dashboard
- Data prep(extract, transform, load) for Data Lake or Data Warehouse
- Building materialized view
- Customer/Product/Service 360
- Transactional MDM data platform
And many more.
All changes, including insert/update/delete as well as DDL changes, are captured and replicated to the iDaaS to ensure the data platform is incrementally updated and sync-ed with the source systems.
Count, row level, field level, incremental verification methods
Changes in source systems typically take less than one second to be reflected in the iDaaS platform. The replication delay can be accurately measured to allow user be aware
Provide "Read your writes" as well as "Causal Consistency" guarantees under circumstances where stronger consistency is required to ensure user experience.
Full suite of Pipeline APIs allows following benefits:
- Easy to develop
- Code versioning
- Quickly build/rebuild entire data platform
Support most common databases and messaging systems including but not limited to Oracle, MySQL, SQLServer, PostgreSQL, MongoDB, DB2, Sybase, Kafka, MQ etc.
See full list of supported data sources & targets
iDaaS is designed with extension in mind. All major components, including Source, Processor and Target, are designed with extensibility in mind. One can easily follow the tutorial or documentation to create custom source, target or processors with the help of Plugin Develeopment Kit
Already setup OGG, Attunity, HVR, Canal ? No problem, you can connect your CDC tool to iDaaS to enjoy the flow engine and data api capability.
Explore, search, create, manage data models, create and run data pipelines
All functionalities can be accessed via Open API for easy integration.
Scalable architecture, docker compatible, can be easily deployed on-prem or on any of the major cloud providers.
SANY Heavy Industry
Chowsangsang
First Auto
ChangAn Auto
China Eastern Airlines
Roadmap
- Install iDaaS
- Install using Docker
- Install from source
- Install from Tapdata Cloud
- Quick Start
- Setup Connections(Data Sources)
- Create a Table to Table replication
- Create a materialized view(wide table)
- Publish a Data API
DaaS Data Storage Engine @Berry
Observability @Aplomb
iDaaS Consistency Model
-
Incremental Engine Architecture
-
Open CDC Standard @Berry
-
Shared Log Mining
-
Incremental Verification
-
iDaaS Shell
-
Python SDK
- iDaaS Pluggable Architecture
- PDK Introduction
- Tutorial: Create & Test a custom database source
- Tutorial: Create & Test a custom data target
- Tutorial: Create & Test a custom SaaS source
- Tutorial: Create & Test a custom processor
- Process for submitting plugin for certification review
- Plugin SPI Reference
-
Working with iDaaS & iModel
- Create a simple iModel
- Publish a Data API
- Create a complex iModel backed by more than one source
-
Working with Data Pipelines
How to get involved
[Open Metadata] (https://docs.open-metadata.org/openmetadata/schemas/overview)