microsoft/dbt-synapse

Installation of dbt-synapse in same env as dbt-sqlserver conflicts namespace and starts running SQL server models using synapse Adapter

alittlesliceoftom opened this issue · 11 comments

Due to the presence of two identical 'types' in the adapters.

On my side dbt defaults to using the synapse connection, I'm not sure why.

I suspect that this would be fixed with an updated type: SqlServerSynapse but I'd have to read and understand the code better to be sure.

Actual error:
('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near 'DISTRIBUTION'. (102) (SQLExecDirectW)")

This could be fixed with isolated envs, but I think that you should be able to use different adapaters separated using the project.yml

great point. It's definitely a use case we'd like to support wherein we could have a dev target be Azure SQL and a production target be Synapse. I've been meaning to rename the target officially to synapse in our code with #6, but have been uncertain of what that means when we need to pull an update from the upstream dbt-sqlserver will it:

  1. respect our rename, or
  2. will we have to do a merge commit where we rename the macro names every time back to synapse?

I guess we won't know until we try!

Yeah that's an option I think. Check out this dbt slack thread from a month ago where some options are discussed.

I guess it's fairly edge case for many users and easily solvable using venvs, so a nice to have feature change really.

Going to do some experiments this week on this.

One option: you can have dbt-synapse pull in dbt-sqlserver as a python dependency. That's the relationship dbt-redshift has with dbt-postgres: it installs it as a dependency, and builds its own classes on top of the posgres ones (e.g.).

That would enable you to avoid repeating boilerplate code around connections, for instance, between this repo and dbt-sqlserver. You'd only need to reimplement (= write code for) the things that are genuinely different on synapse.

macros

Right now, both dbt-sqlserver and dbt-synapse can use sqlserver__ implementations of common ("dispatched") macros because they both use an adapter type named sqlserver.

Currently, adapter.dispatch doesn't include a notion of adapter dependencies, but it could: In that world, dbt-redshift would first look for a redshift__ implementation, then postgres__, then default__. So, too, would dbt-synapse look first for synapse__, then sqlserver__, then default__.

@jtcohen6 this is definitely something I've been wanting to discuss. Take a look at dbt-msft/dbt-sqlserver#32. Looking at just the the .sql macro files you can see there's very little difference between the packages. We'd love the help making streamlining them.

Another possible alternative is to have a single adapter, dbt-tsql that supports SQL Server, Azure SQL, and Azure Synapse Analytics. My thinking here would be to use the EngineEdition property via T-SQL's SERVERPROPERTY() where:

Id Product
1-4,8 SQL Server
5,8,9 Azure SQL
6 Azure Synapse

The high level differences are:

  • SQL Server
    • has a 20 years of legacy, but
    • some functions like DROP ... IF EXISTS weren't introduced until 2016
  • Azure SQL
    • has the vast majority of still-relevant SQL Server functionality
    • (to the extent that dbt-sqlserver works perfectly with Azure SQL especially with recent PRs that add cloud features like Azure Active Directory and External Tables
  • Azure Synapse:
    • promotes a new pattern of creating tables, CTAS that allows specification of indices and distribution
    • is still re-writing T-SQL to be parallel by default so doesn't have all the T-SQL functions of Azure SQL (e.g. MERGE and Table-Valued Functions are now in preview as of October 2020)

from the SERVERPROPERTY() doc

Id EngineEdition
1 Personal or Desktop Engine (Not available in SQL Server 2005 (9.x) and later versions.)
2 Standard (This is returned for Standard, Web, and Business Intelligence.)
3 Enterprise (This is returned for Evaluation, Developer, and Enterprise editions.)
4 Express (This is returned for Express, Express with Tools, and Express with Advanced Services)
5 SQL Database
6 Microsoft Azure Synapse Analytics (formerly SQL Data Warehouse)
8 Azure SQL Managed Instance
9 Azure SQL Edge (This is returned for all editions of Azure SQL Edge)

cc: @NandanHegde15 @alieus @mikaelene

One option: you can have dbt-synapse pull in dbt-sqlserver as a python dependency. That's the relationship dbt-redshift has with dbt-postgres: it installs it as a dependency, and builds its own classes on top of the posgres ones (e.g.).

That would enable you to avoid repeating boilerplate code around connections, for instance, between this repo and dbt-sqlserver. You'd only need to reimplement (= write code for) the things that are genuinely different on synapse.

macros

Right now, both dbt-sqlserver and dbt-synapse can use sqlserver__ implementations of common ("dispatched") macros because they both use an adapter type named sqlserver.

Currently, adapter.dispatch doesn't include a notion of adapter dependencies, but it could: In that world, dbt-redshift would first look for a redshift__ implementation, then postgres__, then default__. So, too, would dbt-synapse look first for synapse__, then sqlserver__, then default__.

This sounds like a great way of solving it. It would make it more transparent what the differences are as well.

getting very close to this with #32!

resolved!