EqualExperts/dbt-unit-testing

expectations break when expecting a Struct in BigQuery

JakoboEU opened this issue ยท 11 comments

Generating an expected result such as:

    {% call dbt_unit_testing.expect() %}
        SELECT
            DATE('2023-02-11') AS snapshot_date,
            STRUCT(
                '01'                AS event_id,
                DATE('2022-01-01')  AS created_on,
                'Sam'         AS name
            ) AS event
        UNION ALL
        SELECT
            DATE('2023-02-11') AS snapshot_date,
            STRUCT(
                '02'                AS event_id,
                DATE('2022-01-01')  AS created_on,
                'Bob'         AS name
            ) AS event
    {% endcall %}

Results in the following SQL being run against BigQuery:

select * FROM 
      (select count(1) as expectation_count from (
        select count(1) as count, `snapshot_date`,`event` from (
        SELECT
            DATE('2023-02-11') AS snapshot_date,
            STRUCT(
                '01'                AS event_id,
                DATE('2022-01-01')  AS created_on,
                'Sam'         AS name
            ) AS event
        UNION ALL
        SELECT
            DATE('2023-02-11') AS snapshot_date,
            STRUCT(
                '02'                AS event_id,
                DATE('2022-01-01')  AS created_on,
                'Bob'         AS name
            ) AS event
    ) as s group by `snapshot_date`,`event`

Which generates the following error:

Grouping by expressions of type STRUCT is not allowed at [21:37]

Because group by snapshot_date, event is illegal due to event being a struct.

dbt_unit_testing version:

  - package: EqualExperts/dbt_unit_testing
    version: 0.3.2

dbt version:

Running with dbt=1.5.1
dbrtly commented

I get exactly the same issue.

I think if we use the bigquery function to_json_string the comparison would get around this problem.

Please take a look at the latest release (0.4.0) and check if it fixes this issue.

Thank you!

I still get exactly the same behaviour. The generated SQL being run against bigquery has a group by struct column. This produces the bigquery error:

"Grouping by expressions of type STRUCT is not allowed" .

@dbrtly solution would work; however, the test subject would need modifying to use TO_JSON_STRING as well. I guess if the library replaced STRUCT( ... ) with TO_JSON_STRING(STRUCT( ...)) in both the expectation and the test subject then this would resolve the issue. Seems a bit hacky though having to change the test subject.

Hi @JakoboEU, can you post here the model and test that you are running? Did you apply a column transformation to all of the struct columns in the model?

@psousa50 I used the same test as in the issue description.

{% call dbt_unit_testing.expect() %}
SELECT
    DATE('2023-02-11') AS snapshot_date,
    STRUCT(
    '01'                AS event_id,
    DATE('2022-01-01')  AS created_on,
    'Sam'         AS name
    ) AS event
UNION ALL
SELECT
    DATE('2023-02-11') AS snapshot_date,
    STRUCT(
    '02'                AS event_id,
    DATE('2022-01-01')  AS created_on,
    'Bob'         AS name
    ) AS event
    {% endcall %}

which gets compiled to

/* {"app": "dbt", "dbt_version": "1.6.6", "profile_name": "default", "target_name": "local", "node_id": "test.xxxxxxxxx"} */

    select * FROM 
      (select count(1) as expectation_count from (
        select count(1) as count, `snapshot_date`, `event` from (
SELECT
    DATE('2023-02-11') AS snapshot_date,
    STRUCT(
    '01'                AS event_id,
    DATE('2022-01-01')  AS created_on,
    'Sam'         AS name
    ) AS event
UNION ALL
SELECT
    DATE('2023-02-11') AS snapshot_date,
    STRUCT(
    '02'                AS event_id,
    DATE('2022-01-01')  AS created_on,
    'Bob'         AS name
    ) AS event
    ) as s group by `snapshot_date`, `event`
  
      ) as exp) as exp_count,
      (select count(1) as actual_count from (
        select count(1) as count, `snapshot_date`, `event` from ( 
      with
      `src_xxx` as (
      

    
    select * from `xxx`.`xxx`.`xxx` where false)

    select * from (

WITH
raw_xxx AS (
  SELECT
      to_hex(md5(cast(coalesce(cast(guid as 
    string
), '') || '-' || coalesce(cast(last_modified_date as 
    string
), '') as 
    string
))) AS event_id,
      guid AS review_id,
      ....
  FROM `src_xxx`
)
SELECT * EXCEPT(row_num)
FROM (
  SELECT
    *,
    row_number() OVER (PARTITION BY event_id) AS row_num
  FROM raw_xxx
)
WHERE row_num = 1 
 ) as t ) as s group by `snapshot_date`, `event`
  
      ) as act) as act_count

HI @JakoboEU, I would like to see the full test, the dbt-unit-testing.test macro, where you define your column transformations unless you are defining them in dbt_project.yml

Hi @psousa50 , I am not aware of any transformations.

The whole test looks like

{{
    config(
        tags=['unit-test', 'unit_test'],
        run_date="2023-02-11"
    )
}}

{% call dbt_unit_testing.test('xxx', 'blah blah blah') %}
    {% call dbt_unit_testing.expect() %}
        # as above
    {% endcall %}
{% endcall %}

And I don't have anything defined in my dbt_project.yml

Oh, I see the problem now ๐Ÿ™‚ For you to be able to use STRUCT columns you need to apply a column transformation to those columns, as explained in the README here. Please refer to this use case for instructions on how to do it.

If you have any issue implementing this please let me know, I'll try to assist you

That's the ticket.

Thank you. Works well.

Cool ๐Ÿ˜Ž