data-quality
There are 301 repositories under data-quality topic.
GokuMohandas/Made-With-ML
Learn how to design, develop, deploy and iterate on production-grade ML applications.
eugeneyan/applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
ydataai/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
great-expectations/great_expectations
Always know what to expect from your data.
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
kestra-io/kestra
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
voxel51/fiftyone
The open-source tool for building high-quality datasets and computer vision models
feast-dev/feast
The Open Source Feature Store for Machine Learning
open-metadata/OpenMetadata
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
datafold/data-diff
Compare tables within or across databases
GokuMohandas/mlops-course
Learn how to design, develop, deploy and iterate on production-grade ML applications.
whylabs/whylogs
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
feathr-ai/feathr
Feathr – A scalable, unified data and AI engineering platform for enterprise
sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
featureform/featureform
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
re-data/re-data
re_data - fix data issues before your users & CEO would discover them 😊
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
daochenzha/data-centric-AI
A curated, but incomplete, list of data-centric AI resources.
cleanlab/cleanvision
Automatically find issues in image datasets and practice data-centric computer vision.
rstudio/pointblank
Data quality assessment and metadata reporting for data frames and database tables
kennethleungty/Failed-ML
Compilation of high-profile real-world examples of failed machine learning projects
WeBankFinTech/Qualitis
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis
opendatadiscovery/awesome-data-catalogs
📙 Awesome Data Catalogs and Observability Platforms.
polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
InfuseAI/piperider
Code review for data in dbt
encord-team/encord-active
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Data-Centric-AI-Community/awesome-data-centric-ai
Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖
data-drift/data-drift
Metrics Observability & Troubleshooting
alibaba/feathub
FeatHub - A stream-batch unified feature store for real-time machine learning
ubisoft/mobydq
:whale: Tool to automate data quality checks on data pipelines
bitol-io/open-data-contract-standard
Home of the Open Data Contract Standard (ODCS).
frederick0329/TracIn
Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
adidas/lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Hyhyhyhyhyhyh/Django-Data-quality-system
数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)
whylabs/whylogs-java
Profile and monitor your ML data pipeline end-to-end