This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.
PythonApache-2.0
Great Assertions
This library is inspired by the Great Expectations library. The library
has made the various expectations found in Great Expectations available
when using the inbuilt python unittest assertions.
To run with xml-runner, there is no difference to how it's currently used.
However you will not be able to get method like to_results_table as these use a different resultclass
The assertions provided by GA will also allow the validation of the any environment including Production.
Currently GA only supports saving the results to Spark, for example databricks.
Once the run has completed there is a save method, as seen below.
The image below shows a simple graph of the accumulation of tests over test run.
However much more complex analysis can be performed with the extended data being generated by GA.
The extended table of results contains the following:
From the extended column you can get further details about the type test, which was executed and the results.
For example if we look at the test expect_table_row_count_to_be_less_than we should assert that the max row should not be breached.
In the code below, the expected was 100 and the actual was 205, which caused the test to fail.
Therefore Analysts can query the extended data to get a picture of the size of the breach.
In production monitoring these types of results can allow the prevention of skewed results.
For example, if you had a result, where the expected values were withing a range of 0-100
and you got an exceptionally large value.
The large value could cause business functionality to be skewed such that a defect could causes
damage or loss of income or incorrect reporting to a downstream system.
Therefore, GA will allow you to provide benchmarks to the production validation and an
experienced analyst can create reports on top of the data.
An example of the extended dataset:
Notes
If you get an arrows function warning when running in Databricks, this will happen
because a toPandas() method is being used for many of the assertions. The plan is
to remove Pandas conversion for pure PySpark code. If this is an issue, please raise
an issue so this method can be prioritised. For now, it’s advisable to make sure the
datasets are not too big, which cause the driver to crash.
Development
To create a development environment, create a virtualenv and make a
development installation