dbt-labs/dbt-core

[Feature] Faster, if unsafe, dbt-compile please! (perhaps without connectors)

guyr-ziprecruiter opened this issue · 6 comments

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

A simple dbt compile run may take several minutes. I gather this is since the connectors are applied and things are verified for correctness (e.g. are the tables and columns referenced really are there). That's wonderful! But if I'm running the same dbt-compile multiple times, I'm willing to take the risk and assume nothing has changed. Hence I suggest a run mode in which whatever collected by the connectors gets cached, and I can reuse it instead of waiting 3 minutes every time I try to compile my model to Athena. Thanks!

Describe alternatives you've considered

Writing a jinja2 template rendered myself, and using jinja2 without dbt. Both did not work.

Who will this benefit?

DBT users compiling their queries to test (e.g. in the Athena console) and wishing not to spend hours waiting (it adds up!).

Are you interested in contributing this feature?

Not impossible if no one better is willing to!

Anything else?

No response

Hey @guyr-ziprecruiter!

First: Are you regularly running dbt compile for all the models in your project? If so, what's the reason why you're doing this? I wonder if you're running compile when a simple parse might do, or if you instead want to just do something like dbt compile --select specific_model.

As far as your proposal:

  • We already support an "unsafe" mode that disallows introspective queries against the data warehouse during compilation. It never makes a connection, but it will also fail gracelessly if you have code attempting to do this: dbt compile --no-introspect
  • The idea of caching DWH responses to use for subsequent queries is one that we've been experimenting with recently, for a different use case: our own testing/validation of dbt-core/adapters. We haven't documented this yet, because it's still in early development and far from stable — but it could offer some components that are extensible to what you're describing, or at least entry-points that might inspire your own implementation. We'll write more about this in the coming months.

Hello again!
I need the SQL query created by the model. So dbt parse is not applicable in my use case if I'm not mistaken.
I switched to --no-introspect:

$ dbt compile --no-introspect --profiles-dir=profiles/ --profile=athena --models foo

But it still took a couple of minutes. So seems like that's not what I was looking for.

Ideally I would like to work with no connector, just get the jinja templates to render and get my query back. Is that possible?

Thanks!