[Bug] Fail fast behavior not correct when using multiple threads
gaoshihang opened this issue · 8 comments
Is this a new bug in dbt-core?
- I believe this is a new bug in dbt-core
- I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
I use fail fast in "dbt run", and using 20 threads to submit some models to Databricks, but when a model failed, others still running.
Expected Behavior
When a model fails, others fail too.
Steps To Reproduce
dbt run --select staging --fail-fast
Relevant log output
No response
Environment
- OS: macos
- Python: 3.9.6
- dbt: 1.7.11
Which database adapter are you using with dbt?
other (mention it in "Additional Context")
Additional Context
dbt-databricks adaptor
Thanks for reaching out @gaoshihang !
I'm guessing that dbt-databricks doesn't support query cancellation. If you need/want this behavior, could you open an issue in the dbt-databricks repo instead?
In the meantime, I'm going to close this issue in favor of an update to our documentation: dbt-labs/docs.getdbt.com#5411
Reprex
models/slow_model.sql
{{ config(materialized="table") }}
with recursive t (i) as (
select 1
union all
select i + 1 from t where i < 100000
)
select sum(i) from t
models/bad_model.sql
{{ config(materialized="table") }}
selec -1 as typo
Using dbt-postgres cancels any concurrent queries:
dbt run --fail-fast --profile postgres
But dbt-duckdb doesn't cancels any concurrent queries because query cancellation is not supported:
dbt run --fail-fast --profile postgres
The latter will raise a warning like this so the user is not surprised by the behavior:
The duckdb adapter does not support query cancellation. Some queries may still be running!
Yes @dbeatty10 thanks! I'll submit an issue to dbt-databricks adaptor side.
Hi @dbeatty10 May I ask another question in this issue?
Can we run multiple “dbt build” at the same time? each “dbt build” handles different part of source data.
For example:
Source data is partitioned by batch_id.
DBT run-1: dbt build --vars ‘{“batch_id”: “1"}’
DBT run-2: dbt build --vars ‘{“batch_id”: “2"}’
DBT run-3: dbt build --vars ‘{“batch_id”: “3"}’
If we can, do we need to executor each command in different dbt project directory?
We are using dbt-core, not dbt-cloud
Can we run multiple “dbt build” at the same time? each “dbt build” handles different part of source data.
dbt-core is designed for a single invocation at any given time. If you construct things such that each dbt build
handles independent portions of data, it may work. But if you run into any issues, we would not consider them a bug since multiple concurrent runs are out of scope for dbt-core.
Thanks @dbeatty10 for your reply. Can you give me some guide on this?
If I run these dbt build in one dbt directory, will the manifest json file will be influenced?
I'm thinking that we use different dbt directory for each "dbt build", handle independent portions of data, then out to different target table, like we add a suffix table_{batch_id}
It might work for you just fine as-is as long as your workloads are completely independent.
If there are any issues with the manifest (or other dbt artifacts), you might be able to workaround them with the --target-path
and/or --no-write-json
flags.
Got it!
Thanks! @dbeatty10