[ADAP-453] [Feature] Overwrite `api.Column.string_type`

Question

[ADAP-453] [Feature] Overwrite `api.Column.string_type`

Opened this issue 2 years ago · 6 comments

rlh1994 commented 2 years ago

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Currently the core dbt api.Column.string_type is not overwritten, however the default value is not suitable for bigquery data types, which I believe only supports the string type.

Describe alternatives you've considered

Using dbt.type_string, but this isn't suitable in all usecases, see here dbt-labs/dbt-core#7103

Who will this benefit?

Cross-warehouse users.

Are you interested in contributing this feature?

No response

Anything else?

No response

rlh1994 commented a year ago

Nope

👍1

Answer 1 · 2023-04-13T17:39:15.000Z

Thanks for opening this issue @rlh1994 !

I know we discussed this briefly here, but could you provide step-by-step instructions to reproduce the error you are experiencing?

Answer 2 · 2023-04-13T18:03:04.000Z

Yeah sure, sorry. As a heads up as well this issue is the same in databricks (spark I assume) as well - do you want me to raise a separate issue in that adaptors?

The easiest way to see it is with a basic model that just casts something to the string type:

{{
  config(
    materialized = 'table',
    )
}}

select
    cast(10 as {{api.Column.string_type(4000)}}) as test_col

Which in the case of a BigQuery target leads to the following compiled code



select
    cast(10 as character varying(4000)) as test_col

and a dbt run gives the error output

 dbt run --target bigquery
17:51:06  Running with dbt=1.4.5
17:51:06  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 337 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
17:51:06  
17:51:07  Concurrency: 1 threads (target='bigquery')
17:51:07  
17:51:07  1 of 1 START sql table model dbt_ryan.test_models .............................. [RUN]
17:51:09  BigQuery adapter: https://console.cloud.google.com/bigquery?project=***&page=queryresults
17:51:09  1 of 1 ERROR creating sql table model dbt_ryan.test_models ..................... [ERROR in 1.85s]
17:51:09  
17:51:09  Finished running 1 table model in 0 hours 0 minutes and 2.48 seconds (2.48s).
17:51:09  
17:51:09  Completed with 1 error and 0 warnings:
17:51:09  
17:51:09  Database Error in model test_models (models/test_models.sql)
17:51:09    Syntax error: Expected ")" but got identifier "varying" at [14:26]
17:51:09    compiled Code at target/run/dbt_demo/models/test_models.sql
17:51:09  
17:51:09  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Because there is no overwritten function, it uses a the core class, which is character varying which doesn't exist in BigQuery, so this api call just errors in bigquery targets.

I could swap out the api.Column.string_type(4000) for type_string() as no one really uses length limits in BigQuery and the default string type is already max size, but as in the other issue type_string() doesn't give the option to specify the length which generates a 256 length string for Redshift, so I can't build a model (or macro) that is easily cross-warehouse and instead have to split the models or dispatch the macro.

Answer 3 · 2023-04-13T20:34:02.000Z

Thanks @rlh1994 -- that is a perfect example 🏆

Based on the documentation for the string_type static class method here, we'd expect the following query to work across adapters:

select cast(10 as {{ api.Column.string_type(4000) }})

Suspected root cause

The primary testing for these two methods appears tautological to me (rather than actually testing platform-specific data types).

Next steps

On my side, I've created an issue in dbt-core to add functional tests for the string_type and numeric_type adapter class methods. From there, the implementation of the adapter-specific method(s) should inherit the (new) functional tests from dbt-core to confirm the fix.

Could you raise an issue in dbt-spark as well? Once the fix is merged, dbt-databricks will inherit it.

Answer 4 · 2023-10-11T01:48:36.000Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

Answer 5 · 2023-10-11T15:16:14.000Z

@rlh1994 I just removed the Stale label.