Tomme/dbt-athena

Query isn't properly formatted when table is partitioned

Opened this issue · 2 comments

Hi,
I'm trying to run

{{ config(partitioned_by='day') }}
select 
    a,
    b,
    c
 from random_table

I can see that in target/run the query is being compiled to

create table
    dev_datawarehouse.test

    with (
        partitioned_by=ARRAY'day',
        format='parquet'
    )
  as
    
select 
    a,
    b,
    c
 from random_table

Which gives the following error: SYNTAX_ERROR: Unknown type: ARRAY.
the compiled query should instead be

create table
    dev_datawarehouse.test

    with (
        partitioned_by=ARRAY['day'],
        format='parquet'
    )
  as
    
select 
    a,
    b,
    c
 from random_table

Which works just fine. Am I missing something obvious?

Ok I've solved the problem: this is wrong {{ config(partitioned_by='day') }} should be {{ config(partitioned_by=['day']) }}
what is less obvious is that if you declare the partition column at the project level like so:

example:
      partitioned_by:
          - day

you'll experience the same error. You should declare it like so:

example:
      partitioned_by: [day]

This StackOverflow answer saying that the 2 forms are equivalent threw me off a bit https://stackoverflow.com/questions/23657086/yaml-multi-line-arrays

To partition by a date column, use syntax like this:

{{ config( partitioned_by = [ 'some_date_column' ] ) }}

select a, b, some_date_column
  from random_table