dbt-labs/dbt-core

[Bug] Fail fast behavior not correct when using multiple threads

gaoshihang opened this issue · 8 comments

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I use fail fast in "dbt run", and using 20 threads to submit some models to Databricks, but when a model failed, others still running.

Expected Behavior

When a model fails, others fail too.

Steps To Reproduce

dbt run --select staging --fail-fast

Relevant log output

No response

Environment

- OS: macos
- Python: 3.9.6
- dbt: 1.7.11

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

dbt-databricks adaptor

image

Thanks for reaching out @gaoshihang !

I'm guessing that dbt-databricks doesn't support query cancellation. If you need/want this behavior, could you open an issue in the dbt-databricks repo instead?

In the meantime, I'm going to close this issue in favor of an update to our documentation: dbt-labs/docs.getdbt.com#5411

Reprex

models/slow_model.sql

{{ config(materialized="table") }}

with recursive t (i) as (

    select 1
    union all
    select i + 1 from t where i < 100000

)

select sum(i) from t

models/bad_model.sql

{{ config(materialized="table") }}

selec -1 as typo

Using dbt-postgres cancels any concurrent queries:

dbt run --fail-fast --profile postgres

But dbt-duckdb doesn't cancels any concurrent queries because query cancellation is not supported:

dbt run --fail-fast --profile postgres

The latter will raise a warning like this so the user is not surprised by the behavior:

The duckdb adapter does not support query cancellation. Some queries may still be running!

Yes @dbeatty10 thanks! I'll submit an issue to dbt-databricks adaptor side.

Hi @dbeatty10 May I ask another question in this issue?

Can we run multiple “dbt build” at the same time? each “dbt build” handles different part of source data.
For example:
Source data is partitioned by batch_id.
DBT run-1: dbt build --vars ‘{“batch_id”: “1"}’
DBT run-2: dbt build --vars ‘{“batch_id”: “2"}’
DBT run-3: dbt build --vars ‘{“batch_id”: “3"}’

If we can, do we need to executor each command in different dbt project directory?

We are using dbt-core, not dbt-cloud

Can we run multiple “dbt build” at the same time? each “dbt build” handles different part of source data.

dbt-core is designed for a single invocation at any given time. If you construct things such that each dbt build handles independent portions of data, it may work. But if you run into any issues, we would not consider them a bug since multiple concurrent runs are out of scope for dbt-core.

Thanks @dbeatty10 for your reply. Can you give me some guide on this?
If I run these dbt build in one dbt directory, will the manifest json file will be influenced?

I'm thinking that we use different dbt directory for each "dbt build", handle independent portions of data, then out to different target table, like we add a suffix table_{batch_id}

It might work for you just fine as-is as long as your workloads are completely independent.

If there are any issues with the manifest (or other dbt artifacts), you might be able to workaround them with the --target-path and/or --no-write-json flags.

Got it!

Thanks! @dbeatty10