Great Expectations Project

This project utilizes Great Expectations, an open-source data validation and documentation framework, to ensure and monitor the data quality of the TRN MSSQL database.

Project Structure

  • great_expectations/: Contains all the configurations, expectations, checkpoints, and data docs generated by Great Expectations.
  • README.md: This file, providing an overview and instructions for the project.

Prerequisites

  • Python 3.11
  • Great Expectations
  • Access to an MSSQL database.
  • Jupyter Notebook (for editing expectation suites interactively)

Setup Instructions

Ensure you have the necessary dependencies installed:

pip install great_expectations jupyter pyodbc

Navigate to the project directory where great_expectations is already set up.

Usage

Creating and Editing Expectation Suites

  • Edit an Existing Expectation Suite: To modify or enhance an existing expectation suite, use the following command:
    great_expectations suite edit employees.warning
    This command opens a Jupyter notebook that allows you to add or modify expectations interactively.

Building and Viewing Data Docs

  • Build Data Docs: Generate documentation for your expectations and validations:

    great_expectations docs build
  • View Data Docs: Open the index.html file located in great_expectations/uncommitted/data_docs/local_site/ in any web browser to view a detailed report of your data quality validations.

Running Checkpoints

  • Run a Checkpoint: To validate your data against the defined expectations, run a checkpoint:
    great_expectations checkpoint run my_checkpoint

Modifying Data

To test the robustness of your data quality checks, manually introduce errors into the hr.employees table and rerun the checkpoint to see how Great Expectations catches these errors.