data-governance
There are 118 repositories under data-governance topic.
open-metadata/OpenMetadata
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
MarquezProject/marquez
Collect, aggregate, and visualize a data ecosystem's metadata
reata/sqllineage
SQL Lineage Analysis Tool powered by Python
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
odpi/egeria
Egeria core
data-drift/data-drift
Metrics Observability & Troubleshooting
Titan-Systems/titan
Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.
tokern/data-lineage
Generate and Visualize Data Lineage from query history
tuva-health/tuva
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
GoogleCloudPlatform/bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
sburn/docker-apache-atlas
This Apache Atlas is built from the latest release source tarball and patched to be run in a Docker container.
waterdipai/datachecks
Open Source Data Quality Monitoring.
opendatadiscovery/opendatadiscovery-specification
ODD Specification is a universal open standard for collecting metadata.
daxa-ai/pebblo
Pebblo enables developers to safely load data and promote their Gen AI app to deployment
odpi/data-governance
Egeria's Guidance on Governance as well as large media files such as presentations and movies
hivemq/hivemq-edge
HiveMQ Edge is an MQTT gateway that enables interoperability between OT devices and IT systems. It translates diverse protocols into MQTT for streamlined communication and helps organize data into a unified namespace, making managing and streaming data across your infrastructure easier.
mara/mara-schema
Mapping of DWH database tables to business entities, attributes & metrics in Python, with automatic creation of flattened tables
aws-samples/document-processing-pipeline-for-regulated-industries
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
conduktor/conduktor-poc-kafka-protocol
POC to demonstrate how to alter incoming/outgoing records in Kafka. It's a toy, don't use it in production.
provectus/data-quality-gate
Data Quality Gate based on AWS
Tinkoff/data-detective
Data catalog for everything in your company
opendatadiscovery/odd-collector
Open-source metadata collector based on ODD Specification
WeBankBlockchain/Data-Export
Data-Export支持将链上数据导出到MySQL、ES等便于进行大数据处理的存储介质中,解决区块链数据复杂查询、分析、可视化和处理的问题。
GoogleCloudPlatform/auto-data-tokenize
Identify and tokenize sensitive data automatically using Cloud DLP and Dataflow
getstrm/pace
Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.
ryandawsonuk/data-platforms-tools
Guide to data platforms and tools
WeBankBlockchain/Data-Stash
Data-Stash是基于FISCO-BCOS的数据仓库组件,通过解析节点的binlog日志,生成该节点状态的全量备份,从而使节点能够实现冷热数据分离和数据裁剪。
WeBankBlockchain/Data-Reconcile
Data-Reconcile是一款基于区块链的对账组件,提供基于区块链智能合约账本的通用化数据对账解决方案,并提供了一套可动态扩展的对账框架,支持定制化开发。
tosh2230/stairlight
A data lineage tool detects table dependencies from rendered SQL statements.
datasphere-oss/datasphere-service
an open source dataworks platform
mesmacosta/datacatalog-util
A Python package to centralize some Google Cloud Data Catalog scripts, this repo contains commands like bulk CSV operations that help leverage Data Catalog features.
mara/mara-metabase
Configuration and schema sync for Metabase from Python
ricardolsmendes/datacatalog-tag-manager
Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources -- currently supports the CSV file format
yu-iskw/dbt-artifacts-loader
Load dbt artifacts uploaded to GCS to BigQuery in order to track historical dbt results