This project utilizes Great Expectations, an open-source data validation and documentation framework, to ensure and monitor the data quality of the TRN
MSSQL database.
great_expectations/
: Contains all the configurations, expectations, checkpoints, and data docs generated by Great Expectations.README.md
: This file, providing an overview and instructions for the project.
- Python 3.11
- Great Expectations
- Access to an MSSQL database.
- Jupyter Notebook (for editing expectation suites interactively)
Ensure you have the necessary dependencies installed:
pip install great_expectations jupyter pyodbc
Navigate to the project directory where great_expectations
is already set up.
- Edit an Existing Expectation Suite:
To modify or enhance an existing expectation suite, use the following command:
This command opens a Jupyter notebook that allows you to add or modify expectations interactively.
great_expectations suite edit employees.warning
-
Build Data Docs: Generate documentation for your expectations and validations:
great_expectations docs build
-
View Data Docs: Open the
index.html
file located ingreat_expectations/uncommitted/data_docs/local_site/
in any web browser to view a detailed report of your data quality validations.
- Run a Checkpoint:
To validate your data against the defined expectations, run a checkpoint:
great_expectations checkpoint run my_checkpoint
To test the robustness of your data quality checks, manually introduce errors into the hr.employees
table and rerun the checkpoint to see how Great Expectations catches these errors.