data-catalog
There are 105 repositories under data-catalog topic.
datahub-project/datahub
The Metadata Platform for your Data and AI Stack
open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
amundsen-io/amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
apache/gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
intake/intake
Intake is a lightweight package for finding, investigating, loading and disseminating data.
opendatadiscovery/awesome-data-catalogs
📙 Awesome Data Catalogs and Observability Platforms.
rsyi/whale
🐳 The stupidly simple CLI workspace for your data warehouse.
gabledata/recap
Work with your web service, database, and streaming schemas in a single format.
tokern/piicatcher
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
raystack/meteor
Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.
GoogleCloudPlatform/bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
intake/intake-esm
An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.
aws-samples/aws-dbs-refarch-datalake
Reference Architectures for Datalakes on AWS
getmetamapper/metamapper
Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.
GoogleCloudPlatform/datacatalog-connectors-rdbms
Sample code with integration between Data Catalog and RDBMS data sources.
google/grizzly
End-to-end DataOps platform deployed by Terraform.
Tinkoff/data-detective
Data catalog for everything in your company
GoogleCloudPlatform/datacatalog-tag-engine
Tag Engine automates the process of creating, updating, deleting, and populating metadata in bulk with the Google Cloud services Data Catalog and Dataplex. Tag Engine is licensed under the Apache 2 license terms. Please make sure to read, understand and agree to the terms of the LICENSE and CONTRIBUTING files before proceeding.
Bayer-Group/COLID-Documentation
The documentation repository is part of the Corporate Linked Data Catalog - short: COLID - application.
opendatadiscovery/odd-collector
Open-source metadata collector based on ODD Specification
ihsn/nada
National Data Archive (NADA) is an open source data cataloging system that serves as a portal for researchers to browse, search, compare, apply for access, and download relevant census or survey information. It was originally developed to support the establishment of national survey data archives.
commondataio/dataportals-registry
Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard
getstrm/pace
Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.
GoogleCloudPlatform/datacatalog-connectors-bi
Sample code with integration between Data Catalog and BI data sources.
tosh2230/stairlight
A data lineage tool detects table dependencies from rendered SQL statements.
awesome-mlops/awesome-data-management
A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀
carte-data/carte
A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable front end that's just HTML.
SciCatProject/frontend
SciCat open data catalogue web client
odpi/egeria-docs
Documentation repository for the Egeria project.
apecs-org/Polar-EO-Database
Polar Earth Observation Database of satellite sensors
datopian/portal.js.bak
🌀 The JS data presentation framework. For a single dataset to a full catalog.
dbt-content/google-datacatalog-dbt-tag
Update a Google Data Catalog tag with dbt Cloud run metadata
aaronspring/remote_climate_data
a collection of remote climate data accessed via intake cached to disk
unytics/catalog_builder
Data Catalogs Made Easy
related-sciences/articat
articat: data artifact catalog