Allow multiple row_filters and params to row_filters
bogo96 opened this issue · 4 comments
Is there any reason that not allow multiple row_filters and params like rule_id?
While using clouddq, i got some needs of this in some situation.
if allow params, then row_filters can be used in general. I mean one row_filter can be used in many entities.
Or sometime, we want to filter by a column other than the one used for rule_binding.
I suggest modifying like below, then filtering email row_filter can be use any entity with any given column.
row_filters:
NONE:
filter_sql_expr: |-
True
DATA_TYPE_EMAIL:
params: |-
- column
filter_sql_expr: |-
$column = 'email'
rule_bindings:
T2_DQ_1_EMAIL:
entity_id: TEST_TABLE
column_id: VALUE
row_filter_ids:
- DATA_TYPE_EMAIL:
column: contact_type
rule_ids:
- NOT_NULL_SIMPLE
- REGEX_VALID_EMAIL
- CUSTOM_SQL_LENGTH:
upper_bound: 30
- NOT_BLANK
metadata:
brand: one
Is there any way to use it in this situation? if not, can i adding this?
Hey @bogo96 thanks for opening this issue. I agree that we should allow 1) allowing multiple row_filters in a rule_binding and 2) allow parametrization of row_filters.
This would be a fairly involved PR but happy to support you in this if you are up to it!
You need up update the DqRowFilter class to support taking parameters and adding a method to allow resolving these parameters into a valid SQL string. See below for example code where we've added parametrization to a CUSTOM_SQL_EXPR rule_type:
https://github.com/GoogleCloudPlatform/cloud-data-quality/blob/main/clouddq/classes/dq_rule_binding.py#L246
https://github.com/GoogleCloudPlatform/cloud-data-quality/pull/124/files#diff-53a624ae4d4f1930d8966d1ddd72980a9e347567cd8b9f05ca3018eb739ebae8
Hi @yankisimo thanks for checking in. We've started adding test cases to @bogo96's original PR in
https://github.com/GoogleCloudPlatform/cloud-data-quality/pull/205/files but this work was blocked on a few pending decisions on the API interface for defining multiple row filters in a rule bindings, among a few other higher priority bugfix.
Apologies for the delay. We will update the ticket once the feature is closer to completion.
Hi @thinhha, Sure and thanks for the update. If there's any way I can help or contribute, please let me know. On our end, we are thrilled about the significant contributions of this library and its usage in Dataplex. We have a few production projects relying on these new features and we are eagerly awaiting their availability.