dbt-labs/dbt-utils

Allow multiple partition columns in mutually_exclusive_ranges.sql without concatenation

Caitlin-Syntax opened this issue · 0 comments

Describe the feature

An alternative title for this request could be "Update or eliminate column selection that reduces test functionality."

The mutually_exclusive_ranges test currently only allows partitioning by one column at a time unless you are "sneaky" and concatenate values together in the partition_by config, but this is not well-documented in the README, and concatenation introduces failure risk if not done carefully (this is likely negligible risk, but it's non-zero).

There are use plenty of use cases where it is meaningful to partition by more than one column. For example, in the US healthcare industry, a patient may have two active insurance plans at one time, but not two active plans from the same insurance company, so to create a table listing the active time ranges for these coverages and test that they do not overlap, you need to be able to partition by both patient and insurance company. Or as seen in Issue 423, you may need to break up customer account records across projects. (This issue is actually introduced by the fix for that issue because prior to the column alias being added, a comma delimited list of columns would work as desired, so I'm not sure if there's a reason that wasn't the recommended solution.)

This can be achieved with the current test by eliminating or making a small adjustment to the line in the window_functions CTE that effectively forces a single partition column today: {{ partition_by }} as partition_by_col. The partition_by_col value is not used anywhere in downstream logic, so it could either be deleted entirely, or it could be updated to wrap the {{ partition_by }} value in quotes. Either option would allow a comma-delimited list or a concatenated list to work as a partition_by config statement.

At minimum, the README could be updated to indicate that the concatenation technique is an available option.

Describe alternatives you've considered

  • Use the concatenation option
  • Duplicate this test locally and remove the line as described above

Additional context

No major context to consider as far as I'm aware. This adjustment should be agnostic of any particular database/platform unless there's some sort of problem with the "wrap it in quotes" option in a db I'm unfamiliar with.

Who will this benefit?

Anyone who needs to partition by more than one column! Examples provided above.

Are you interested in contributing this feature?

Happy to help, but it would probably take more time and effort to get me set up than for someone "plugged in" already to simply make the fix.