/odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Primary LanguageJavaApache License 2.0Apache-2.0

open-data-discovery-logo 

Next-Gen Data Discovery and Data Observability Platform

Apache2 Maintenance GitHub contributors GitHub issues by-label

WebsiteLinkedInSlackDocumentationBlogDemo

Next-Gen Data Discovery and Data Observability Platform

Demo

Play with our demo app!

Introduction

ODD is an open-source data discovery and observability tool for data teams that helps to efficiently democratise data, power collaboration and reduce time on data discovery through modern user-friendly environment.

Key wins

  • Shorten data discovery phase

  • Have transparency on how and by whom the data is used

  • Foster data culture by continuous compliance and data quality monitoring

  • Accelerate data insights

  • Know the sources of your dashboards and ad hoc reports

  • Deprecate outdated objects responsibly by assessing and mitigating the risks

  • 👉 ODD Platform is a reference implementation of Open Data Discovery Spec.

Features

Data Discovery and Observability

  • Accumulate scattered data insights in Federated Data catalogue
  • Gain observability through E2E Data objects Lineage
  • Benefit from cutting-edge E2E microservices Lineage feature in tracking your data flow through the whole data landscape
  • Be warned and alerted by Pipeline Monitoring tools
  • Store your metadata
  • Use ODD-native modern lightweight UI

ML First citizen

  • Save results of your ML Experiments by automatically logging its parameters

Data Security & Compliance

  • Manage Tags and Labels to prevent any abuse of the data
  • Refer to Tags and Labels to stay compliant with data security standards
  • Have full transparency on how and by whom the data is used

Data Quality

  • Simplify DQ processes by using ODD and Great Expectations compatibility
  • Integrate ODD with any custom DQ framework

Getting Started

Running as a separate container

Setting up PostgreSQL connection details, for example:

export POSTGRES_HOST=172.17.0.1 \
export POSTGRES_PORT=5432 \
export POSTGRES_DATABASE=postgres \
export POSTGRES_USER=postgres \
export POSTGRES_PASSWORD=mysecretpassword

Starting new instance of the platform:

docker run -d \
  --name odd-platform \
  -e SPRING_DATASOURCE_URL=jdbc:postgresql://${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DATABASE} \
  -e SPRING_DATASOURCE_USERNAME=${POSTGRES_USER} \
  -e SPRING_DATASOURCE_PASSWORD=${POSTGRES_PASSWORD} \  
  -p 8080:8080 \
  ghcr.io/opendatadiscovery/odd-platform:latest

Go to localhost:8080 in case of local environment

Running Locally with Docker Compose

docker-compose -f docker/demo.yaml up -d odd-platform-enricher

Deploying to Kubernetes with Helm Charts

Example configurations

There are various example configurations (via docker-compose) within docker/examples directory.

Contributing

Contributing to ODD Platform is very welcome. For basic contributions, all you need is being comfortable with GitHub and Git. The best ways to contribute are:

  • Work on new adapters
  • Work on documentation

To ensure equal and positive communication, we adhere to our Code of Conduct. Before starting any interactions with this repository, please read it and make sure to follow.

Please before contributing check out our Contributing Guide and issues labeled "good first issue":

GitHub issues by-label


Integrations

OpenDataDiscovery Platform offers comprehensive data source support to meet your needs.

Existing integrations
Proxy Adapter Airflow Apache Druid
Cassandra Clickhouse Elasticsearch
Hive Kafka Feast
MSSQL MySQL Microsoft ODBC
MongoDB Neo4j MariaDB
Oracle PostgreSQL Redshift
Snowflake Vertica Tarantool
Athena DynamoDB Glue
Kinesis Quicksight S3
SageMaker SageMaker Featurestore SQS
Delta lake S3 Tableau Cube
SuperSet PowerBi Trino
Presto DBT Redash
Spark MLflow Kubeflow
Databricks Unity Catalog Great Expectations SQLite
Couchbase Cockroachdb Fivetran
Airbyte Metabase Mode
BigQuery Singlestore

ODD Data Model

ODD operates the following high-level types of entities:

  1. Datasets (collections of data: tables, topics, files, feature groups)
  2. Transformers (transformers of data: ETL or ML training jobs, experiments)
  3. Data Consumers (data consumers: ML models or BI dashboards)
  4. Data Quality Tests (data quality tests for datasets)
  5. Data Inputs (sources of data)
  6. Transformer Runs (executions of ETL or ML training jobs)
  7. Quality Test Runs executions of data quality tests

For more information, please check specification.md.

Community Support

Join our community if you need help, want to chat or have any other questions for us:

  • GitHub - Discussion forums and issues
  • Slack - Join the conversation! Get all the latest updates and chat to the devs

Contacts

If you have any questions or ideas, please don't hesitate to drop a line to any of us.

Team Member LinkedIn GitHub
German Osin LinkedIn germanosin
Nikita Dementev LinkedIn DementevNikita
Damir Abdullin LinkedIn damirabdul
Alexey Kozyurov LinkedIn Leshe4ka
Pavel Makarichev LinkedIn vixtir
Roman Zabaluev LinkedIn Haarolean

License

ODD Platform uses the Apache 2.0 License.