microsoft/dbt-synapse

Error when using incremental models on Synapse REPLICATE table distribution

ThomasCarterSuzy opened this issue · 4 comments

When running incremental models on Synpase Replicated tables an error is thrown:
[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Option 'REPLICATE User Temp Table' is not supported in this version of SQL Server. (104458) (SQLExecDirectW)

It appears the fix is to override the sqlserver__make_temp_relation macro and remove the # ~ (temp table identifier) from {% set tmp_identifier = '#' ~ base_relation.identifier ~ suffix %}

Hi. First, I want to say thanks to all the contributors here for making it possible to use dbt with Azure Synapse. I've been running through a series of tests, and bumped into this error. Here's a bit more detail on the problem for clarity.

The test case was - Distribution of the model was changed to REPLICATE

Model

{{ config(materialized='incremental'
, index='clustered index(OLDACNUM)'
, dist='REPLICATE'
, on_schema_change='sync_all_columns' )    }}

SELECT TOP (1000) 
      [RecordCreateRunId]
      ,[DEBTORNUM]
      ,[OLDACNUM]
      ,[CREDCOLACTION]
      ,[LASTSTMTDATE]
      ,[CUST_TYPE]
      ,[LETTERDATE]
      ,[CHARGE_CODE]
      ,[NUM_REMINDERS]
      ,[PAYMENT_CODE]
      ,[CYCLE_GROUP]
      ,ODS_START_DATE	  
FROM [DBO].[AR_DEBTOR]
{% if is_incremental() %}
  -- this filter will only be applied on an incremental run
  WHERE ODS_START_DATE > (select max(ODS_START_DATE) from {{ this }})
{% endif %}

Incremental Run

  • The model was run as incremental
  • This caused the error "'REPLICATE User Temp Table' is not supported in this version of SQL Server"
  • Offending code:
  CREATE TABLE "DBT"."#STG_DEBTORS__dbt_tmp"
    WITH(
      DISTRIBUTION = REPLICATE,
      clustered index(OLDACNUM)
      )
    AS (SELECT * FROM DBT.STG_DEBTORS__dbt_tmp_temp_view)
  • Temporary tables don't support REPLICATE distribution.
  • The recommended fix above means the generated code will never use temporary tables in any circumstance (I think). This may be problematic. In the dedicated SQL pool resource, temporary tables offer a performance benefit because their results are written to local rather than remote storage. A better approach would be only not to use temporary tables when DISTRIBUTION=REPLICATE.

Versions - dbt =1.1.2, sqlserver=1.1.1, synapse = 1.1.0

Hi

I've been able to fix this issue by overriding the sqlserver__make_temp_relation with a synapse version of this macro that does not use a temp table when the distribution type is REPLICATE.

Create a new macro in the macros folder of your project and use this code.

{# Fix for Temporary tables don't support REPLICATE distribution. Don't use temp table if Dist=REPLICATE #}
{% macro synapse__make_temp_relation(base_relation, suffix) %}
    {% if config.get('dist')|upper == 'REPLICATE' -%}
        {% set tmp_identifier = base_relation.identifier ~ suffix %}
    {%- else -%}
        {% set tmp_identifier = '#' ~  base_relation.identifier ~ suffix %}                        
   {% endif %}
    {% set tmp_relation = base_relation.incorporate(path={"identifier": tmp_identifier}) -%}
    {% do return(tmp_relation) %}
{% endmacro %}

To permanently fix this in this synapse adapter, this logic should be added to the relation.sql file.

PR #138

You can now create seed tables with different distribution and index strategy by providing required confiuration in dbt_project.yml file. The default choice is REPLICATE disttribution and HEAP (no indexing). If you want to override this configuration, the following sample should help.

seeds:
  jaffle_shop:
    index: HEAP
    dist: ROUND_ROBIN
    raw_customers:
      index: HEAP
      dist: REPLICATE
    raw_payments:
      dist: HASH(payment_method)
      index: CLUSTERED INDEX(id,order_id)  

Create a new context "seeds:" at the root followed by project name and seed name. In this case the project name is jaffle_shop and seeds are raw_customers and raw_payments. Provide index and distribution values using index and dist keys. Use replicate, round_robin, hash({column name}) as a value. Example: dist: replicate. The raw_customers seed table will be replicated a table. For hash distribution, the user need to provide the vaule HASH(payment_method). Example: dist: hash(payment_method)

To specific index, index as a key and CLUSTERED INDEX({Column1, Column2}), HEAP, CLUSTERED COLUMNSTORE INDEX as a value. Example: index: HEAP. The raw_customers seed table will use heap index strategy. For clustered index, the user need to provide one or more columns to create clustered index on. Example: index: CLUSTERED INDEX(id,order_id). The default value of index and distribution can also be set for all seeds under project name.

Hi

I've been able to fix this issue by overriding the sqlserver__make_temp_relation with a synapse version of this macro that does not use a temp table when the distribution type is REPLICATE.

Create a new macro in the macros folder of your project and use this code.

{# Fix for Temporary tables don't support REPLICATE distribution. Don't use temp table if Dist=REPLICATE #}
{% macro synapse__make_temp_relation(base_relation, suffix) %}
    {% if config.get('dist')|upper == 'REPLICATE' -%}
        {% set tmp_identifier = base_relation.identifier ~ suffix %}
    {%- else -%}
        {% set tmp_identifier = '#' ~  base_relation.identifier ~ suffix %}                        
   {% endif %}
    {% set tmp_relation = base_relation.incorporate(path={"identifier": tmp_identifier}) -%}
    {% do return(tmp_relation) %}
{% endmacro %}

To permanently fix this in this synapse adapter, this logic should be added to the relation.sql file.

Is this going to be incorporated into the adapter, and if so any thoughts on when?