Dashlane/dbt-invoke

Feature: Set case flag for names

BeadW opened this issue · 8 comments

BeadW commented

Seems like at the moment dbt-invoke is not doing any treatment to resource names.

Simple use case here is column naming coming from Snowflake.

It always comes back as upper case (unless quoted identifiers were on) and we have a lower case convention in our yaml and our sql.

It would be nice to be able to set a flag so that we conform the case of resources and columns.

BeadW commented

Adding to this one. When names are in a different case dbt-invoke treats them as being different causing it to nuke metadata. This might actually be a bug given that scenario.

BeadW commented

@robastel interested in your thoughts on expected behavior when the name in yaml is a different case to that of the column in the db.
For example a column id in yaml in snowflake is going to most likely be ID.
Currently this is treated as id is no longer in the object so its removed.
A new column called ID is added to the yaml.

Hi @BeadW, sorry my focus was away from dbt-invoke for a bit. Thanks for reporting this interesting issue.

In Snowflake, when using quoted identifiers, can you have multiple columns in the same table that would all be the same if converted to the same case?

  • For example: "mycolumn", "myColumn", and "MYCOLUMN"

Hi @robastel

Firstly - awesome tool.. we developed an in-house python script to pre-create YML files over a year ago (using information_schema from database, rather than thru DBT), but its a "first time only" thing as we haven't spent time sorting out preserving descriptions etc. dbt-invoke is likely going to replace it completely.

Can confirm the issue - dbt-invoke is nuking metadata from files with the enforced case change.
image

Info on DBT and snowflake quoting here: https://docs.getdbt.com/reference/project-configs/quoting#snowflake
(recommendation is to default quoting to false in dbt-project.yml to make using objects in snowflake much easier)

And snowflake behaviour here: https://community.snowflake.com/s/article/Unquoted-object-names-are-stored-in-upper-case-letters

Hi @BeadW, sorry my focus was away from dbt-invoke for a bit. Thanks for reporting this interesting issue.

In Snowflake, when using quoted identifiers, can you have multiple columns in the same table that would all be the same if converted to the same case?

* For example: "mycolumn", "myColumn", and "MYCOLUMN"

@robastel intersting question, I'll go check manually and report back 🤓

Wow thats ugly.. (generated manually, not with DBT)

image

image

Thanks @HiltonDay for doing some research on this issue!

Given your Text_Case/text_case/TEXT_CASE example and the possible collisions that could result from altering the case of identifiers, I'm inclined to leave dbt-invoke's behavior as is.

I'll leave this issue open for a bit in case there are any further proposals to account for collisions.

Given the general propensity to snake case over all-caps, this is likely an issue for anyone using snowflake+dbt?

The default behaviour for snowflake is to turn off the case sensitivty for DBT (https://docs.getdbt.com/reference/project-configs/quoting#default).

Might I suggest adding the behaviour via a separate command line flag like "--ignore-case-sensitivity" so that the 99% of Snowflake users who are sane, might benefit from your tool? That way there's an explicit user opt-in at runtime to the behaviour, and no general risk of people murdering their documentation by accident? :)

My take is, any organisation who are overloading the same-named columns with other variations in case sensitivity, deserve all kinds of special punishment, but they're already doing that to their future selves, so agree that this shouldn't be a default behaviour.