Installation of dbt-synapse in same env as dbt-sqlserver conflicts namespace and starts running SQL server models using synapse Adapter
alittlesliceoftom opened this issue · 11 comments
Due to the presence of two identical 'types' in the adapters.
On my side dbt defaults to using the synapse connection, I'm not sure why.
I suspect that this would be fixed with an updated type: SqlServerSynapse but I'd have to read and understand the code better to be sure.
Actual error:
('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near 'DISTRIBUTION'. (102) (SQLExecDirectW)")
This could be fixed with isolated envs, but I think that you should be able to use different adapaters separated using the project.yml
great point. It's definitely a use case we'd like to support wherein we could have a dev target be Azure SQL and a production target be Synapse. I've been meaning to rename the target officially to synapse
in our code with #6, but have been uncertain of what that means when we need to pull an update from the upstream dbt-sqlserver
will it:
- respect our rename, or
- will we have to do a merge commit where we rename the macro names every time back to
synapse
?
I guess we won't know until we try!
Yeah that's an option I think. Check out this dbt slack thread from a month ago where some options are discussed.
I guess it's fairly edge case for many users and easily solvable using venvs, so a nice to have feature change really.
Going to do some experiments this week on this.
One option: you can have dbt-synapse
pull in dbt-sqlserver
as a python dependency. That's the relationship dbt-redshift
has with dbt-postgres
: it installs it as a dependency, and builds its own classes on top of the posgres ones (e.g.).
That would enable you to avoid repeating boilerplate code around connections, for instance, between this repo and dbt-sqlserver
. You'd only need to reimplement (= write code for) the things that are genuinely different on synapse.
macros
Right now, both dbt-sqlserver
and dbt-synapse
can use sqlserver__
implementations of common ("dispatched") macros because they both use an adapter type named sqlserver
.
Currently, adapter.dispatch
doesn't include a notion of adapter dependencies, but it could: In that world, dbt-redshift
would first look for a redshift__
implementation, then postgres__
, then default__
. So, too, would dbt-synapse
look first for synapse__
, then sqlserver__
, then default__
.
@jtcohen6 this is definitely something I've been wanting to discuss. Take a look at dbt-msft/dbt-sqlserver#32. Looking at just the the .sql
macro files you can see there's very little difference between the packages. We'd love the help making streamlining them.
Another possible alternative is to have a single adapter, dbt-tsql
that supports SQL Server, Azure SQL, and Azure Synapse Analytics. My thinking here would be to use the EngineEdition
property via T-SQL's SERVERPROPERTY()
where:
Id | Product |
---|---|
1-4,8 |
SQL Server |
5,8,9 |
Azure SQL |
6 |
Azure Synapse |
The high level differences are:
- SQL Server
- has a 20 years of legacy, but
- some functions like
DROP ... IF EXISTS
weren't introduced until 2016
- Azure SQL
- has the vast majority of still-relevant SQL Server functionality
- (to the extent that
dbt-sqlserver
works perfectly with Azure SQL especially with recent PRs that add cloud features like Azure Active Directory and External Tables
- Azure Synapse:
- promotes a new pattern of creating tables,
CTAS
that allows specification of indices and distribution - is still re-writing T-SQL to be parallel by default so doesn't have all the T-SQL functions of Azure SQL (e.g.
MERGE
and Table-Valued Functions are now in preview as of October 2020)
- promotes a new pattern of creating tables,
from the SERVERPROPERTY()
doc
Id | EngineEdition |
---|---|
1 | Personal or Desktop Engine (Not available in SQL Server 2005 (9.x) and later versions.) |
2 | Standard (This is returned for Standard, Web, and Business Intelligence.) |
3 | Enterprise (This is returned for Evaluation, Developer, and Enterprise editions.) |
4 | Express (This is returned for Express, Express with Tools, and Express with Advanced Services) |
5 | SQL Database |
6 | Microsoft Azure Synapse Analytics (formerly SQL Data Warehouse) |
8 | Azure SQL Managed Instance |
9 | Azure SQL Edge (This is returned for all editions of Azure SQL Edge) |
One option: you can have
dbt-synapse
pull indbt-sqlserver
as a python dependency. That's the relationshipdbt-redshift
has withdbt-postgres
: it installs it as a dependency, and builds its own classes on top of the posgres ones (e.g.).That would enable you to avoid repeating boilerplate code around connections, for instance, between this repo and
dbt-sqlserver
. You'd only need to reimplement (= write code for) the things that are genuinely different on synapse.macros
Right now, both
dbt-sqlserver
anddbt-synapse
can usesqlserver__
implementations of common ("dispatched") macros because they both use an adapter type namedsqlserver
.Currently,
adapter.dispatch
doesn't include a notion of adapter dependencies, but it could: In that world,dbt-redshift
would first look for aredshift__
implementation, thenpostgres__
, thendefault__
. So, too, woulddbt-synapse
look first forsynapse__
, thensqlserver__
, thendefault__
.
This sounds like a great way of solving it. It would make it more transparent what the differences are as well.
resolved!