[ADAP-453] [Feature] Overwrite `api.Column.string_type`
Opened this issue ยท 6 comments
Is this your first time submitting a feature request?
- I have read the expectations for open source contributors
- I have searched the existing issues, and I could not find an existing issue for this feature
- I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Currently the core dbt api.Column.string_type
is not overwritten, however the default value is not suitable for bigquery data types, which I believe only supports the string
type.
Describe alternatives you've considered
Using dbt.type_string
, but this isn't suitable in all usecases, see here dbt-labs/dbt-core#7103
Who will this benefit?
Cross-warehouse users.
Are you interested in contributing this feature?
No response
Anything else?
No response
Yeah sure, sorry. As a heads up as well this issue is the same in databricks (spark I assume) as well - do you want me to raise a separate issue in that adaptors?
The easiest way to see it is with a basic model that just casts something to the string type:
{{
config(
materialized = 'table',
)
}}
select
cast(10 as {{api.Column.string_type(4000)}}) as test_col
Which in the case of a BigQuery target leads to the following compiled code
select
cast(10 as character varying(4000)) as test_col
and a dbt run
gives the error output
dbt run --target bigquery
17:51:06 Running with dbt=1.4.5
17:51:06 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 337 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
17:51:06
17:51:07 Concurrency: 1 threads (target='bigquery')
17:51:07
17:51:07 1 of 1 START sql table model dbt_ryan.test_models .............................. [RUN]
17:51:09 BigQuery adapter: https://console.cloud.google.com/bigquery?project=***&page=queryresults
17:51:09 1 of 1 ERROR creating sql table model dbt_ryan.test_models ..................... [ERROR in 1.85s]
17:51:09
17:51:09 Finished running 1 table model in 0 hours 0 minutes and 2.48 seconds (2.48s).
17:51:09
17:51:09 Completed with 1 error and 0 warnings:
17:51:09
17:51:09 Database Error in model test_models (models/test_models.sql)
17:51:09 Syntax error: Expected ")" but got identifier "varying" at [14:26]
17:51:09 compiled Code at target/run/dbt_demo/models/test_models.sql
17:51:09
17:51:09 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
Because there is no overwritten function, it uses a the core class, which is character varying
which doesn't exist in BigQuery, so this api call just errors in bigquery targets.
I could swap out the api.Column.string_type(4000)
for type_string()
as no one really uses length limits in BigQuery and the default string type is already max size, but as in the other issue type_string()
doesn't give the option to specify the length which generates a 256 length string for Redshift, so I can't build a model (or macro) that is easily cross-warehouse and instead have to split the models or dispatch the macro.
Thanks @rlh1994 -- that is a perfect example ๐
Based on the documentation for the string_type
static class method here, we'd expect the following query to work across adapters:
select cast(10 as {{ api.Column.string_type(4000) }})
Suspected root cause
The primary testing for these two methods appears tautological to me (rather than actually testing platform-specific data types).
Next steps
On my side, I've created an issue in dbt-core to add functional tests for the string_type
and numeric_type
adapter class methods. From there, the implementation of the adapter-specific method(s) should inherit the (new) functional tests from dbt-core to confirm the fix.
Could you raise an issue in dbt-spark as well? Once the fix is merged, dbt-databricks will inherit it.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Nope