Examples of data quality processes implemeted in Databricks

This repository contains a collection of Databricks notebooks that demonstrate configurable data quality processes that can be implemented in Databricks using python and SQL.

The processes detailed in this repository are related to data quality and data product management, they include methods for automating the maintenance of a data dictionary, refining a data model (comments and column positions), executing data quality tests, blocking bad quality data and value mapping.

The repository contains a html version of each notebook that can be viewed in a browser and a dbc archive that can be imported into a Databricks workspace. Execute Run All on the notebooks in their numebered order to reproduce the demo in your own workspace.

Notebooks

Create sample data using Databricks data sets.
Create data dictionary tables.
Update data dictionaries using metastore data4. Refine data model.
Comment and reorder columns
Configuring data quality tests.
Executing data quality tests.
Blocking bad quality data
Mapping local values to global ones
Clean up (drop all tables created during demo).

mariuspc/data_quality_databricks

Examples of data quality processes implemeted in Databricks

Notebooks