data-quality

There are 449 repositories under data-quality topic.

GokuMohandas/Made-With-ML
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Language:Jupyter Notebook43.1k 1.3k 746.7k
eugeneyan/applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
28.3k 956 243.8k
ydataai/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Language:Python13.1k 150 8481.7k
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Language:Python10.9k 88 374850
great-expectations/great_expectations
Always know what to expect from your data.
Language:Python10.8k 83 2k1.6k
voxel51/fiftyone
Refine high-quality datasets and visual AI models
Language:Python9.9k 67 1.7k665
open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
Language:TypeScript7.5k 50 8.5k1.4k
evidentlyai/evidently
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Language:Jupyter Notebook6.6k 52 458724
feast-dev/feast
The Open Source Feature Store for AI/ML
Language:Python6.3k 74 1.7k1.1k
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
Language:Go4.9k 41 3.7k395
GokuMohandas/mlops-course
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Language:Jupyter Notebook3.2k 55 19563
datafold/data-diff
Compare tables within or across databases
Language:Python3k 20 318295
whylabs/whylogs
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
Language:Jupyter Notebook2.8k 33 433132
sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Language:Python2.2k 13 395242
featureform/featureform
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
Language:Go1.9k 15 15999
feathr-ai/feathr
Feathr – A scalable, unified data and AI engineering platform for enterprise
Language:Scala1.9k 61 337234
re-data/re-data
re_data - fix data issues before your users & CEO would discover them 😊
Language:HTML1.6k 22 196122
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Language:Java1.3k 18 645128
NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
Language:Python1.1k 18 259174
daochenzha/data-centric-AI
A curated, but incomplete, list of data-centric AI resources.
1.1k 20 379
cleanlab/cleanvision
Automatically find issues in image datasets and practice data-centric computer vision.
Language:Python1.1k 16 8575
rstudio/pointblank
Data quality assessment and metadata reporting for data frames and database tables
Language:R990 30 34159
opendatadiscovery/awesome-data-catalogs
📙 Awesome Data Catalogs and Observability Platforms.
902 23 864
WeBankFinTech/Qualitis
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis
Language:Java752 40 117311
kennethleungty/Failed-ML
Compilation of high-profile real-world examples of failed machine learning projects
737 18 049
datavane/datavines
Know your data better！Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
Language:Java667 13 201185
bitol-io/open-data-contract-standard
Home of the Open Data Contract Standard (ODCS).
Language:Ruby550 25 2056
polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Language:Python519 12 1446
InfuseAI/piperider
Code review for data in dbt
Language:Python490 12 7524
MigoXLab/dingo
Dingo: A Comprehensive AI Data Quality Evaluation Tool
Language:JavaScript463 5 2146
encord-team/encord-active
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Language:Python453 10 1427
Data-Centric-AI-Community/awesome-data-centric-ai
Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖
Language:Jupyter Notebook339 18 247
alibaba/feathub
FeatHub - A stream-batch unified feature store for real-time machine learning
Language:Python338 10 12755
data-drift/data-drift
Metrics Observability & Troubleshooting
Language:HTML323 6 4812
databrickslabs/dqx
Databricks framework to validate Data Quality of pySpark DataFrames
Language:Python313 7 18860
posit-dev/pointblank
Data validation made beautiful and powerful
Language:Python276 3 3820

data-quality

GokuMohandas/Made-With-ML

eugeneyan/applied-ml

ydataai/ydata-profiling

cleanlab/cleanlab

great-expectations/great_expectations

voxel51/fiftyone

open-metadata/OpenMetadata

evidentlyai/evidently

feast-dev/feast

treeverse/lakeFS

GokuMohandas/mlops-course

datafold/data-diff

whylabs/whylogs

sodadata/soda-core

featureform/featureform

feathr-ai/feathr

re-data/re-data

opendatadiscovery/odd-platform

NVIDIA-NeMo/Curator

daochenzha/data-centric-AI

cleanlab/cleanvision

rstudio/pointblank

opendatadiscovery/awesome-data-catalogs

WeBankFinTech/Qualitis

kennethleungty/Failed-ML

datavane/datavines

bitol-io/open-data-contract-standard

polyaxon/traceml

InfuseAI/piperider

MigoXLab/dingo

encord-team/encord-active

Data-Centric-AI-Community/awesome-data-centric-ai

alibaba/feathub

data-drift/data-drift

databrickslabs/dqx

posit-dev/pointblank