datafold/data-diff

[Feature] Add support for detecting primary keys based on DBT contracted constraints or custom DBT PK tests

ttusing opened this issue · 2 comments

Is your feature request related to a problem? Please describe.
In order to get data-diff setup with my DBT installation, it needs to know the primary keys. I have already defined my primary keys in my project using custom tests (in my case, dbt-constraints.primary_key from Snowflake Labs). I would like to use this definition instead of having to add the meta property to my YML files.

For my contracted models, I'dl like to use the constraints property.

Describe the solution you'd like
I would like to modify this section of code to check for constraints. I would also like to have it check the dbt_project.yml for a config that shows manually configured custom tests.

Describe alternatives you've considered
Add the meta element to all of my tables and keep them in sync.

I have this on a fork, testing locally. https://github.com/ttusing/data-diff

Hi @ttusing ,

Thank you for trying out data-diff and for taking the time to open this issue and implement the solution!

We made a hard decision to sunset the data-diff package and won't provide further development or support. Diffing functionality will continue to be available and evolving in Datafold Cloud.

Detecting PKs based on more metadata, such as constraints is a great idea, and we have this on the roadmap for the Cloud (at the moment we infer them based on uniqueness tests).

Feel free to take it for a trial or contact us at support@datafold.com if you have any questions.

-Gleb