This repository contains a data quality library designed to be used within Jupyter notebooks or Databricks environments. The library provides tools for generating reports about data quality.
To install the data quality library, follow these steps:
- Import the
_dq-library.ipynb
notebook into your Databricks workspace. - Run the notebook to make the library functions available in your environment.
To utilize the library, execute the _dq-library.ipynb
notebook in your Databricks workspace:
%run "/path/to/_dq-library"
Replace "/path/to/_dq-library"
with the actual path to the _dq-library.ipynb
notebook.
Here is a basic example of how to use a function from the library after loading it:
df_processed, dq_summary = calculate_data_quality(dataframe, config_json)
Substitute validate_data
with the actual function you wish to use, and dataframe
with your data frame object.
a complete example can be found under Main.ipynb
Below are the functions provided by the data quality library. .
calculate_data_quality(df_policies, config_json)
: Use dataframe to be assessed, and DQ business rules [[to be further documented]] config as input and generate 2 dataframes df_processed, dq_summary.df_processed
: Include additional columns logging the DQ resultdq_summary
: Include DQ KPIs for each field
No dependencies install requirement
To test the functions of the library, follow these steps:
- Navigate to the test notebook or scripts.
- Execute the tests and review the results.
If there are specific commands or scripts to run the tests, provide them here.
Contributions to the library are welcome! Here's how you can contribute:
- Fork the repository.
- Create a new branch for your feature (
git checkout -b feature/YourFeature
). - Commit your changes (
git commit -am 'Add some feature'
). - Push to the branch (
git push origin feature/YourFeature
). - Create a new Pull Request.
This library is licensed under the [License Name]. See the LICENSE file for details.
For questions or feedback regarding this library, please reach out to the maintainers.
- Name: Mahmal Sami & xxx
- Email: [mahmalsami@gmail.com]