dataquality
There are 81 repositories under dataquality topic.
great-expectations/great_expectations
Always know what to expect from your data.
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
awslabs/deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
datafold/data-diff
Compare tables within or across databases
sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
re-data/re-data
re_data - fix data issues before your users & CEO would discover them 😊
zinggAI/zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
chaos-genius/chaos_genius
ML powered analytics engine for outlier detection and root cause analysis.
datacleaner/DataCleaner
The premier open source Data Quality solution
datavane/datavines
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
IBM/lale
Library for Semi-Automated Data Science
canimus/cuallee
Possibly the fastest DataFrame-agnostic quality check library in town.
datachecks/dcs-core
Open Source Data Quality Monitoring.
AutoViML/pandas_dq
Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.
OSMCha/osmcha-frontend
Frontend for the osmcha-django REST API
DataKitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
schic/DQCS
数据质量控制系统
infinitelambda/dq-tools
Make simple storing test results and visualisation of these in a BI dashboard
DataKitchen/dataops-testgen
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring
AltimateAI/datapilot-cli
Datailot-cli is the command line interface for accessing the AI teammate for engineers to ensure best practices in their SQL and dbt projects.
qizhixinhit/Dirty-dataImpacts
Codes&Datasets
BirdiD/BirdiDQ
BirdiDQ leverages the power of the Python Great Expectations open-source library and combines it with the simplicity of natural language queries to effortlessly identify and report data quality issues, all at the tip of your fingers.
open-metadata/openmetadata-site
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
bikash/DataQuality
Tutorial and examples of Data Quality in Big Data System
grillazz/fastapi-greatexpectations
Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool
HuemulSolutions/huemul-bigdatagovernance
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de derechos ARCO para facilitar la implementación de leyes de protección de datos tipo GDPR, identificar los niveles de seguridad y si se está aplicando algún tipo de encriptación. Adicionalmente permite agregar reglas de validación más complejas sobre la misma tabla.
sodadata/soda-github-action
:zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.
josephmachado/data-quality-w-greatexpectations
Code for data quality with greatexpectations blog
Data-Culpa/openclients
Open source clients for working with Data Culpa Validator services from data pipelines
Luzzu/Framework
Luzzu Quality Assessment Framework
ydataai/ydata-talkdatatome
Make your dataset talk to you. The AI assistant for data preparation.
devoteamgcloud/dataform-assertions
Enhance your data testing seamlessly with this Dataform package, featuring robust common assertions to ensure the accuracy and integrity of your warehouse data.
opt-nc/setup-duckdb-action
🦆 Blazing Fast and highly customizable Github Action to setup a DuckDb runtime
rodrigobaron/qafs
Quality Aware Feature Store
martandsingh/SQL-DQC
SQL based data profiling & data quality checks, which will help you to perform data profiling & data quality checks on SQL database at table & database level.