dataquality

There are 79 repositories under dataquality topic.

  • great-expectations/great_expectations

    Always know what to expect from your data.

    Language:Python9.6k821.8k1.5k
  • cleanlab/cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

    Language:Python8.9k85344685
  • OpenMetadata

    open-metadata/OpenMetadata

    OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.

    Language:TypeScript4.4k446.7k881
  • awslabs/deequ

    Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

    Language:Scala3.2k80333513
  • datafold/data-diff

    Compare tables within or across databases

    Language:Python2.9k22318245
  • soda-core

    sodadata/soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

    Language:Python1.8k12337188
  • re-data/re-data

    re_data - fix data issues before your users & CEO would discover them 😊

    Language:HTML1.5k24194121
  • zingg

    zinggAI/zingg

    Scalable identity resolution, entity resolution, data mastering and deduplication using ML

    Language:Java90618437109
  • chaos_genius

    chaos-genius/chaos_genius

    ML powered analytics engine for outlier detection and root cause analysis.

    Language:Python7081134381
  • datacleaner/DataCleaner

    The premier open source Data Quality solution

    Language:Java572631k179
  • datavane/datavines

    Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.

    Language:Java36111131127
  • IBM/lale

    Library for Semi-Automated Data Science

    Language:Python324236183
  • datachecks

    waterdipai/datachecks

    Open Source Data Quality Monitoring.

    Language:Python13126018
  • AutoViML/pandas_dq

    Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.

    Language:Python1245411
  • OSMCha/osmcha-frontend

    Frontend for the osmcha-django REST API

    Language:JavaScript12010346636
  • canimus/cuallee

    Possibly the fastest DataFrame-agnostic quality check library in town.

    Language:Python11752613
  • data-observability-installer

    DataKitchen/data-observability-installer

    Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

    Language:Python523
  • amora-data-build-tool

    mundipagg/amora-data-build-tool

    Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and select statements with SQLAlchemy. Amora is able to transform Python code into SQL data transformation jobs that run inside the warehouse.

    Language:Python46574
  • schic/DQCS

    数据质量控制系统

    Language:Java4161229
  • infinitelambda/dq-tools

    Make simple storing test results and visualisation of these in a BI dashboard

    Language:PLpgSQL32263
  • dataops-testgen

    DataKitchen/dataops-testgen

    DataOps TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling,  new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing testing of new data refreshes, & continuous data anomaly monitoring

    Language:Python25210
  • AltimateAI/datapilot-cli

    Datailot-cli is the command line interface for accessing the AI teammate for engineers to ensure best practices in their SQL and dbt projects.

    Language:Python20220
  • qizhixinhit/Dirty-dataImpacts

    Codes&Datasets

    Language:C++17404
  • bikash/DataQuality

    Tutorial and examples of Data Quality in Big Data System

  • grillazz/fastapi-greatexpectations

    Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool

    Language:Python121190
  • HuemulSolutions/huemul-bigdatagovernance

    Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de derechos ARCO para facilitar la implementación de leyes de protección de datos tipo GDPR, identificar los niveles de seguridad y si se está aplicando algún tipo de encriptación. Adicionalmente permite agregar reglas de validación más complejas sobre la misma tabla.

    Language:Scala114988
  • open-metadata/openmetadata-site

    Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

    Language:CSS11388
  • sodadata/soda-github-action

    :zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.

    Language:Python11610
  • BirdiD/BirdiDQ

    BirdiDQ leverages the power of the Python Great Expectations open-source library and combines it with the simplicity of natural language queries to effortlessly identify and report data quality issues, all at the tip of your fingers.

    Language:Jupyter Notebook102
  • openclients

    Data-Culpa/openclients

    Open source clients for working with Data Culpa Validator services from data pipelines

    Language:Python9341
  • Luzzu/Framework

    Luzzu Quality Assessment Framework

    Language:Java85107
  • rodrigobaron/qafs

    Quality Aware Feature Store

    Language:Python8200
  • SQL-DQC

    martandsingh/SQL-DQC

    SQL based data profiling & data quality checks, which will help you to perform data profiling & data quality checks on SQL database at table & database level.

    Language:TSQL7200
  • setup-duckdb-action

    opt-nc/setup-duckdb-action

    🦆 Blazing Fast and highly customizable Github Action to setup a DuckDb runtime

    Language:JavaScript72170
  • ydataai/ydata-talkdatatome

    Make your dataset talk to you. The AI assistant for data preparation.

    Language:Python7801
  • arrahtech/osdq-core

    The core library of osDQ

    Language:Java65119