dbt-labs/dbt-core

Rework adapter.dispatch(), remove packages arg

jtcohen6 opened this issue · 2 comments

We've run into some issues (#3362) with statically analyzing macros, which is required to resolve v0.19.1 bugs, continue our performance work, and enable compelling features in the future.

The big blocker is that adapter.dispatch() is not-quite statically analyzable today, since the packages argument can accept a Jinja macro. We're working on a fix that will be backwards compatible for how we know this is used in the wild today, by packages such as dbt-utils, fivetran-utils, and dbt-expectations.

Namely, so long as the user sets a variable named <package_name>_dispatch_list, dbt will set packages to the equivalent of var('<package_name>_dispatch_list', []) + ['<package_name']. All open source packages I know of that lean on dispatch have followed this convention for the _get_namespaces() macro that they currently pass to the packages argument of dispatched macros.

In the meantime, we also want to change dispatch() to:

  • Safeguard how users are actually getting value from it
  • Ensure that all macro dependencies in a dbt project are statically analyzable

Proposal

The Jinja macro adapter.dispatch() should only take one argument, macro_name (a string literal). Package maintainers shouldn't need to think about what kind of vars/macros to pass into the packages arg at all.

For users, instead of setting a magic variable name (e.g. dbt_utils_dispatch_list), we should support a new config specification in dbt_project.yml:

dispatch:
  - macro_namespace: dbt_utils
    search_order: ['spark_utils', 'dbt_utils']
  - macro_namespace: fivetran_utils
    search_order: ['my_project', 'dbt_utils', 'fivetran_utils']

When dbt goes to dispatch a macro, it uses the search_order corresponding to the macro_namespace (package) of the dispatching macro. This accomplishes the same functionality as the current packages + vars setup, but with much less confusion, and it centralizes all macro "routing" logic in the project file. (If a dispatching macro does not have a "route" set in the project dispatch config, it just searches within its own project—the same as current behavior when packages is None.)

This should also make the syntax for dispatch a little less onerous:

{% macro my_macro(field) %}
  {{ return(adapter.dispatch('my_macro')(field)) }}
{% endmacro %}

Whether the python dispatch argument still takes two arguments, with search_order passed in as packages, is an implementation detail. But in v0.20.0, we should start raising a deprecation warning if the user passes a value to the packages argument of the adapter.dispatch() Jinja macro.

And in a future version of dbt, once everyone has had a chance to upgrade to this new way of dispatching, we should remove the Jinja AST-parsing logic that #3363 is adding for backwards compatibility.

I think I 90% understand where this is coming from, and I'm sure I'll get the remaining 10% as I brood on it. Thanks @jtcohen6 your strong written communication skills!

I'm also pretty certain I've seen the missing shim macro described in #3362 while debugging synapse__ macros that themselves dispatch to the corresponding sqlserver__ macro. Does this mean a missing shim macro won't throw this error anymore? 'dict object' has no attribute 'current_timestamp'

I'm also onboard with the new dispatch config. You're right that there may be dragons in assuming that the dispatch var name aligns with the project name. Def a better to lock-Side note: the disconnect b/w repo and package names for dbt-audit-helper (dbt-audit-helper vs. audit-helper) threw me for a small loop last week.

default search space

Perhaps explicit search path would be a happy path for users new to shimming to develop a set of shims for a package.

e.g. a user who wants to make a proposed change to tsql-utils could put this in their current project then put whatever sqlserver__ macro into their current macros/ dir?

config:
- macro_namespace: dbt-utils
  search_order: ['dbt_utils', 'my_project', 'tsql-utils']

before it seems that macros in the user's current project were available to the package being shimmed, right? Maybe this was never the case and I would just override the main macro rather than trying to put a shim into the macros/ dir...

implication for dispatched adapters

this question that might be more about v0.20.0 than this particular proposal. I tried to see how dbt-redshift's new inheritance behavior and couldn't find it...

with this proposal, would all projects, using the dbt-synapse adapter need to add a dispatch config like this?

config:
- macro_namespace: dbt-sqlserver
  search_order: ['spark_utils', 'dbt_utils']

Thanks for the quick read + feedback @swanderz!

Does this mean a missing shim macro won't throw this error anymore? 'dict object' has no attribute 'current_timestamp'

I sure hope not! That's an ugly error. I'm not positive what's causing it, but I very much hope that this change will simplify the experience of building and testing shim (and shim-able) packages.

e.g. a user who wants to make a proposed change to tsql-utils could put this in their current project then put whatever sqlserver__ macro into their current macros/ dir?

That's right. You can just as easily shim dbt_utils with macros in your own root project macros/ folder, as you can with macros coming from an installed compatibility package. This is a great way of achieving the desired functionality, before potentially upstreaming it to an open source shim package.

As far as adapter inheritance in v0.20.0: This is built-in, and not something a user needs to specify in their dbt_project.yml at all. Already, dbt searches for macros based on two criteria: adapter and package. Simply by using dbt-redshift, dbt will now know to look for macros prefixed redshift__, postgres__, and default__ (in that order).

So in general, I think a config like this makes a lot of sense:

dispatch:
  - macro_namespace: dbt_utils
    search_order: ['my_project', 'tsql_utils', 'dbt_utils']

That is, when dispatching dbt_utils.some_macro on dbt-synapse (which inherits from dbt-sqlserver), dbt would look for macros in the following order:

  1. my_project.synapse__some_macro
  2. my_project.sqlserver__some_macro
  3. my_project.default__some_macro
  4. tsql_utils.synapse__some_macro
  5. tsql_utils.sqlserver__some_macro
  6. tsql_utils.default__some_macro
  7. dbt_utils.synapse__some_macro
  8. dbt_utils.sqlserver__some_macro
  9. dbt_utils.default__some_macro

The first one it finds is the one it will use. To put the same logic in English:

  • Look in my own project first, to see if I've overridden any macros
  • Then look in tsql_utils, preferring the synapse__ version if available, sqlserver__ otherwise
  • Finally, if none of the above is found, use the default__ version in dbt_utils

Question: Should we name search_order something different in this proposed config, given that it's only search order of packages (user-controllable), and doesn't account for search order of adapters (built in)?