dbt-labs/dbt-core

[Feature] Snapshots should respect generate schema/database name macro

graciegoheen opened this issue · 4 comments

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Historically, snapshots require you set set an explicit target_schema and target_database - meaning by default, regardless of what environment you're in (dev, ci, prod, etc.), when you run dbt snapshot it will build to the same table.

This can be problematic when trying to test out changes to existing snapshot or develop a new snapshot. Now that we have deferral and dbt clone, it no longer makes sense that snapshots wouldn't respect generate_schema_name / generate_database_name like other resources (models, seeds, etc.).

We need to consider backwards compatibility for this change.

Acceptance criteria

  • we makes these old configs optional and if you don't set a target_schema or target_database, we default to using the generate_x_name macros
  • add configs schema and database to be consistent with other resource types (inputs to the generate_x_name macros)
  • continue to accept old names as the "hard-coded" / current experience

Describe alternatives you've considered

Our pal @dbeatty10 often configured his snapshots as so:

{% snapshot my_snapshot %}

{{
    config(
      target_database=target.database,
      target_schema=target.schema,
      …

Our pal @jeremyyeo often does:

{% snapshot snappy %}

{{
    config(
        target_schema=generate_schema_name('snapshots'),
        ...

Relevant docs / discussions

From the snapshots github discussion:
Screenshot 2024-06-10 at 10 56 16 AM
Screenshot 2024-06-10 at 4 23 26 PM
Screenshot 2024-06-12 at 11 40 56 AM

@jtcohen6 do you happen to know why snapshots were originally built with this explicit required override for target_schema? Was this implemented before deferral existed?

Here's the history I was able to Indiana Jones real quick:

  • dbt 0.5.1 (2016-10-21)
    • Raiders of the Lost Archive -- version your raw data to make historical queries more accurate
    • target_schema: where to archive the data to
  • dbt 0.18.0 (2020-09-03)
    • Added --defer and --state flags to dbt run, to defer to a previously generated manifest for unselected nodes in a run. (#2527, #2656)

@QMalcolm this feels like something we should have some tests to capture. And for future features, should we consider adding some guidelines of adding new features(like creating some kind of tests)?