data-quality-checks

There are 76 repositories under data-quality-checks topic.

  • OpenMetadata

    open-metadata/OpenMetadata

    OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

    Language:TypeScript4.7k466.7k913
  • soda-core

    sodadata/soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

    Language:Python1.8k11344192
  • re-data/re-data

    re_data - fix data issues before your users & CEO would discover them 😊

    Language:HTML1.5k24196121
  • polyaxon/traceml

    Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

    Language:Python494141443
  • ubisoft/mobydq

    :whale: Tool to automate data quality checks on data pipelines

    Language:Vue246168259
  • Hyhyhyhyhyhyh/Django-Data-quality-system

    数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)

    Language:Python17810972
  • AKSW/RDFUnit

    An RDF Unit Testing Suite

    Language:Java150308242
  • canimus/cuallee

    Possibly the fastest DataFrame-agnostic quality check library in town.

    Language:Python12552615
  • NBi

    Seddryck/NBi

    NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.

    Language:C#1061768337
  • swiple

    Swiple/swiple

    Swiple enables you to easily observe, understand, validate and improve the quality of your data

    Language:Python782310
  • PovertyAction/high-frequency-checks

    A Stata template for running high frequency checks of incoming research data at Innovations for Poverty Action

    Language:Stata75308453
  • socialpoint-labs/sqlbucket

    Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.

    Language:Python72438
  • dqo

    dqops/dqo

    Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

    Language:Java706912
  • evidentlyai/ml_observability_course

    Free Open-source ML observability course for data scientists and ML engineers. Learn how to monitor and debug your ML models in production.

    Language:Jupyter Notebook605121
  • google/data-quality-monitor

    Data Quality Monitor (DQM) - Continuously validate your data with easy, customizable rules.

    Language:TypeScript27713
  • mfcabrera/hooqu

    hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python

    Language:Python25421
  • josephmachado/python_essentials_for_data_engineers

    Code for blog at https://www.startdataengineering.com/post/python-for-de/

    Language:Python2416
  • PEDSnet/Data-Quality-Analysis

    The PEDSnet Data Quality Assessment Toolkit (OMOP CDM)

    Language:R24131676
  • redflag

    scienxlab/redflag

    Safety net for machine learning pipelines. Plays nice with sklearn and pandas.

    Language:Python212946
  • baligoyem/dataqtor

    🔍Your Data Quality Detector / Gain insight into your data and get it ready for use before you start working with it 💡📊🛠💎

    Language:Python16117
  • sleepepi/slice

    A clinical research interface geared at collecting robust and consistent data by providing a strong framework for designing data dictionaries and collection forms.

    Language:Ruby1167816
  • sodadata/soda-github-action

    :zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.

    Language:Python11610
  • christianbors/OpenRefineQualityMetrics

    MetricDoc is an interactive visual exploration environment for assessing data quality

    Language:JavaScript8311
  • penguin-datalayer

    DP6/penguin-datalayer

    Crawler assistido para validação de objetos enviados à camada de dados (Data Layer)

    Language:JavaScript72525
  • penguin-datalayer-core

    DP6/penguin-datalayer-core

    Validation core engine for the data layer of the Raft Suite ecosystem.

    Language:JavaScript62373
  • zqtzt/CODCQC

    An open source Python interface to the quality control of ocean in-situ observations

    Language:Python6101
  • Garett601/data-quality-reports

    A function that automatically generates a Data Quality Report for your data

    Language:Jupyter Notebook5111
  • JoanyMarino/RPackages4DQA

    Collection of R scripts to test packages in conducting data quality assessments

    Language:HTML5402
  • Unicorninha

    tmilitino/Unicorninha

    Projeto de conclusão de curso do CESAR SCHOOL voltado para avaliação de ferramentas de Qualidade de Dados.

    Language:Python4100
  • anilkulkarni87/databricks_notebooks

    A collection of Databricks notebooks for testing and learning

    Language:HTML3301
  • flaviaouyang/molly

    Data quality monitoring library designed for time series data, made for modern data stack

    Language:Python3210
  • grandelli/clouddq-samples

    Repo that contains data quality sample tasks for Google CloudDQ and Dataplex DQ Tasks

  • mathewsrc/ETL-Chicago-Cafe-Permits

    This ETL (Extract, Transform, Load) project employs several Python libraries, including Airflow, Soda, Polars, YData Profiling, DuckDB, Requests, Loguru, and Google Cloud to streamline the extraction, transformation, and loading of CSV datasets from the U.S. government's data repository at https://catalog.data.gov.

    Language:HTML3101
  • casualcomputer/sql.mechanic

    Functions that generate SQL queries that summarize high-dimensional tables stored in various databases (e.g. Microsoft SQL Servers, Netezza, DB2, Postgres, Oracle, MySQL, etc.).

    Language:R2100
  • LouisdeBruijn/waterfall-logging

    a Python package to log (distinct) column counts in a DataFrame, export it as a Markdown table and plot a Waterfall statistics figure.

    Language:Python2230
  • sumanthprabhu/DQC-Toolkit

    Quality Checks for Training Data in Machine Learning

    Language:Jupyter Notebook2