dbt-labs/dbt-core

case insensitive column names for unit test YML

graciegoheen opened this issue · 2 comments

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

For snowflake, this unit test definition results in a SQL parsing error:

unit_tests:
  - name: test_valid_email_address # this is the unique name of the test
    description: my favorite unit test
    model: dim_wizards # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: "1", email: cool@example.com,     email_top_level_domain: example.com}
          - {wizard_id: "2", email: cool@unknown.com,     email_top_level_domain: unknown.com}
          - {wizard_id: "3", email: badgmail.com,         email_top_level_domain: gmail.com}
          - {wizard_id: "4", email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows: []
    expect: # the expected output given the inputs above
      rows:
        - {wizard_id: "1", is_valid_email_address: true}
        - {wizard_id: "2", is_valid_email_address: false}
        - {wizard_id: "3", is_valid_email_address: false}
        - {wizard_id: "4", is_valid_email_address: false}
22:42:12 On unit_test.coalesce_ci_demo_2023.dim_wizards.test_valid_email_address: create or replace temporary table DEVELOPMENT.dbt_ggoheen.test_valid_email_address__dbt_tmp
         as
        (select * from (
        with  __dbt__cte__stg_wizards as (

-- Fixture for stg_wizards
select try_cast(null as NUMBER) AS WIZARD_ID, try_cast(null as VARCHAR) AS WIZARD_NAME, try_cast(null as VARCHAR) AS EMAIL, try_cast(null as VARCHAR) AS EMAIL_TOP_LEVEL_DOMAIN, try_cast(null as VARCHAR) AS PHONE_NUMBER, try_cast(null as NUMBER) AS WORLD_ID, 
    try_cast('1' as )
 AS wizard_id, 
    try_cast('cool@example.com' as )
 AS email, 
    try_cast('example.com' as )
 AS email_top_level_domain
union all
select try_cast(null as NUMBER) AS WIZARD_ID, try_cast(null as VARCHAR) AS WIZARD_NAME, try_cast(null as VARCHAR) AS EMAIL, try_cast(null as VARCHAR) AS EMAIL_TOP_LEVEL_DOMAIN, try_cast(null as VARCHAR) AS PHONE_NUMBER, try_cast(null as NUMBER) AS WORLD_ID, 
    try_cast('2' as )
 AS wizard_id, 
    try_cast('cool@unknown.com' as )
 AS email, 
    try_cast('unknown.com' as )
 AS email_top_level_domain
union all
select try_cast(null as NUMBER) AS WIZARD_ID, try_cast(null as VARCHAR) AS WIZARD_NAME, try_cast(null as VARCHAR) AS EMAIL, try_cast(null as VARCHAR) AS EMAIL_TOP_LEVEL_DOMAIN, try_cast(null as VARCHAR) AS PHONE_NUMBER, try_cast(null as NUMBER) AS WORLD_ID, 
    try_cast('3' as )
 AS wizard_id, 
    try_cast('badgmail.com' as )
 AS email, 
    try_cast('gmail.com' as )
 AS email_top_level_domain
union all
select try_cast(null as NUMBER) AS WIZARD_ID, try_cast(null as VARCHAR) AS WIZARD_NAME, try_cast(null as VARCHAR) AS EMAIL, try_cast(null as VARCHAR) AS EMAIL_TOP_LEVEL_DOMAIN, try_cast(null as VARCHAR) AS PHONE_NUMBER, try_cast(null as NUMBER) AS WORLD_ID, 
    try_cast('4' as )
 AS wizard_id, 
    try_cast('missingdot@gmailcom' as )
 AS email, 
    try_cast('gmail.com' as )
 AS email_top_level_domain
),  __dbt__cte__stg_worlds as (

-- Fixture for stg_worlds
select try_cast(null as NUMBER) AS WORLD_ID, try_cast(null as VARCHAR) AS WORLD_NAME
    limit 0
),  __dbt__cte__top_level_email_domains as (

-- Fixture for top_level_email_domains
select try_cast(null as VARCHAR) AS TLD, 
    try_cast('example.com' as )
 AS tld
union all
select try_cast(null as VARCHAR) AS TLD, 
    try_cast('gmail.com' as )
 AS tld
), wizards as (

    select * from __dbt__cte__stg_wizards

),

worlds as (

    select * from __dbt__cte__stg_worlds

),

accepted_email_domains as (

    select * from __dbt__cte__top_level_email_domains

),

check_valid_emails as (

    select  
        wizards.wizard_id,
        wizards.wizard_name,
        wizards.email,
        wizards.phone_number,
        wizards.world_id,

		coalesce (regexp_like(
            wizards.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
        )
        = true
        and accepted_email_domains.tld is not null,
        false) as is_valid_email_address

    from wizards
    left join accepted_email_domains
        on wizards.email_top_level_domain = lower(accepted_email_domains.tld)

)

select
    check_valid_emails.wizard_id,
    check_valid_emails.wizard_name,
    check_valid_emails.email,
    check_valid_emails.is_valid_email_address,
    check_valid_emails.phone_number,
    worlds.world_name
from check_valid_emails
left join worlds
    on check_valid_emails.world_id = worlds.world_id
    ) as __dbt_sbq
    where false
    limit 0

        );
22:42:12 Snowflake adapter: Snowflake query id: 01b1efd2-0804-0b33-000d-37832a1436ee
22:42:12 Snowflake adapter: Snowflake error: 001003 (42000): SQL compilation error:
syntax error line 8 at position 20 unexpected ')'.
syntax error line 10 at position 35 unexpected ')'.
syntax error line 12 at position 30 unexpected ')'.
syntax error line 16 at position 20 unexpected ')'.
syntax error line 18 at position 35 unexpected ')'.
syntax error line 20 at position 30 unexpected ')'.
syntax error line 24 at position 20 unexpected ')'.
syntax error line 26 at position 31 unexpected ')'.
syntax error line 28 at position 28 unexpected ')'.
syntax error line 32 at position 20 unexpected ')'.
syntax error line 34 at position 38 unexpected ')'.
syntax error line 36 at position 28 unexpected ')'.
syntax error line 47 at position 30 unexpected ')'.
syntax error line 51 at position 28 unexpected ')'.
22:42:12 Timing info for unit_test.coalesce_ci_demo_2023.dim_wizards.test_valid_email_address (execute): 22:42:11.707437 => 22:42:12.835004
22:42:12 On unit_test.coalesce_ci_demo_2023.dim_wizards.test_valid_email_address: Close
22:42:12 Runtime Error in unit_test test_valid_email_address (models/marts/unit_tests.yml)
  During unit test execution of dim_wizards::test_valid_email_address, dbt could not build the 'actual' result for comparison against 'expected' given the unit test definition:
   Database Error
    001003 (42000): SQL compilation error:
    syntax error line 8 at position 20 unexpected ')'.
    syntax error line 10 at position 35 unexpected ')'.
    syntax error line 12 at position 30 unexpected ')'.
    syntax error line 16 at position 20 unexpected ')'.
    syntax error line 18 at position 35 unexpected ')'.
    syntax error line 20 at position 30 unexpected ')'.
    syntax error line 24 at position 20 unexpected ')'.
    syntax error line 26 at position 31 unexpected ')'.
    syntax error line 28 at position 28 unexpected ')'.
    syntax error line 32 at position 20 unexpected ')'.
    syntax error line 34 at position 38 unexpected ')'.
    syntax error line 36 at position 28 unexpected ')'.
    syntax error line 47 at position 30 unexpected ')'.
    syntax error line 51 at position 28 unexpected ')'.
22:42:12 1 of 1 ERROR dim_wizards::test_valid_email_address ............................. [ERROR in 1.28s]
22:42:12 Finished running node unit_test.coalesce_ci_demo_2023.dim_wizards.test_valid_email_address

If I use all UPPERCASE for my field names, this error goes away.

Expected Behavior

I should be able to use lowercase filed names in my unit test definition

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@jtcohen6 found this when we were doing some initial spiking for complex type support in snowflake, and I've started folding it into the complex type work here: #9102.

Given the severity of this issue, I'll split it out into a separate fix since it applies to all unit test execution in snowflake, regardless of complex type usage.

The fix for this will be in dbt-adapter in the unit test materialization macros, but should be tested generically across all 1p adapters. For example, a baseline adapter test that just checks for primative type support, similar to what we have in core here would have caught this and can be part of the acceptance criteria here.

@MichelleArk is the underlying root cause relating to dbt-snowflake's default column quoting?

And presumably the fix is to utilize adapter.quote in all the right places?