data-quality
There are 449 repositories under data-quality topic.
GokuMohandas/Made-With-ML
Learn how to design, develop, deploy and iterate on production-grade ML applications.
eugeneyan/applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
ydataai/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
great-expectations/great_expectations
Always know what to expect from your data.
voxel51/fiftyone
Refine high-quality datasets and visual AI models
open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
evidentlyai/evidently
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
feast-dev/feast
The Open Source Feature Store for AI/ML
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
GokuMohandas/mlops-course
Learn how to design, develop, deploy and iterate on production-grade ML applications.
datafold/data-diff
Compare tables within or across databases
whylabs/whylogs
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
featureform/featureform
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
feathr-ai/feathr
Feathr – A scalable, unified data and AI engineering platform for enterprise
re-data/re-data
re_data - fix data issues before your users & CEO would discover them 😊
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
daochenzha/data-centric-AI
A curated, but incomplete, list of data-centric AI resources.
cleanlab/cleanvision
Automatically find issues in image datasets and practice data-centric computer vision.
rstudio/pointblank
Data quality assessment and metadata reporting for data frames and database tables
opendatadiscovery/awesome-data-catalogs
📙 Awesome Data Catalogs and Observability Platforms.
WeBankFinTech/Qualitis
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis
kennethleungty/Failed-ML
Compilation of high-profile real-world examples of failed machine learning projects
datavane/datavines
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
bitol-io/open-data-contract-standard
Home of the Open Data Contract Standard (ODCS).
polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
InfuseAI/piperider
Code review for data in dbt
MigoXLab/dingo
Dingo: A Comprehensive AI Data Quality Evaluation Tool
encord-team/encord-active
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Data-Centric-AI-Community/awesome-data-centric-ai
Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖
alibaba/feathub
FeatHub - A stream-batch unified feature store for real-time machine learning
data-drift/data-drift
Metrics Observability & Troubleshooting
databrickslabs/dqx
Databricks framework to validate Data Quality of pySpark DataFrames
posit-dev/pointblank
Data validation made beautiful and powerful