column_id is NULL for CUSTOM_SQL_STATEMENT rules in summary table
Closed this issue · 5 comments
Hi there, thanks for trying CloudDQ and leaving feedback.
TL;DR it is intended.
Custom SQL statement rules can technically validate multiple columns (as is the case with the duplicate check right) or any column not referenced in the rule binding, hence why column id is set to null.
Let us know if this doesn't work for you and if so, the use-case you're trying to achieve.
Hi @thinhha, thanks for the quick clarification.
The use-case we want to archive is to have more generic custom tests, especially with [multiple] input parameters while still be able to see the corresponding column in the summary results.
One possible test for checking if values are between are specific range is e.g:
CUSTOM_SQL_LENGTH_BETWEEN:
rule_type: CUSTOM_SQL_STATEMENT
dimension: correctness
params:
custom_sql_arguments:
- min
- max
custom_sql_statement: |-
SELECT $column
FROM data
WHERE LENGTH($column) >= $min AND LENGTH($column) <= $max
This is currently not possible with the CUSTOM_SQL_EXPR rule type, as it does not allow for any custom_sql_arguments.
However, with the CUSTOM_SQL_EXPR the corresponding column is referenced in the summary result.
Would one possible solution be, to add arguments to the CUSTOM_SQL_EXPR rule type as well ?
Thanks for the detailed use-case description!
Yes you're right. It makes sense to add parameterization to custom SQL expression rules.
I'll prioritize this FR in the short-term and will update this ticket once it is released.
@tooobsias FYI this change has been implemented here: #124
This will be released in version 0.5.1 which is planned for next week.
Release 0.5.1 is now live: https://github.com/GoogleCloudPlatform/cloud-data-quality/releases/tag/v0.5.1
Apologies for the delay.