dbt-labs/dbt-core

[CT-895] [Spike] Consolidate current_timestamp & associates

jtcohen6 opened this issue Β· 12 comments

Split out from dbt-labs/dbt-utils#577 + dbt-labs/dbt-utils#597, resolves dbt-labs/dbt-utils#339

Describe the feature

Rewrite current_timestamp + current_timestamp_utc to be built out of core/plugin functionality (e.g. date_function) and a convert_timezone macro.

Currently in dbt-core, there are three methods/macros all purporting to do the same thing. I've linked the examples from Postgres.

Then, there are the two versions currently living in dbt-utils:

Things we need to figure out

  • Are the different implementations here functionally identical? (Doug's initial finding was "yes": dbt-labs/dbt-utils#597 (comment))
  • If the data types are different, why?
  • In what cases would their functional result differ? I'm imagining a case where Snowflake parameters TIMEZONE and TIMESTAMP_TYPE_MAPPING are both set

Ideal outcome

When I close my eyes and imagine how this could be really good:

  1. We have one method/macro, defined in dbt-core and implemented by each adapter plugin, that provides the SQL for getting the current timestamp. Whenever we need the current timestamp in a different data type, we use that current_timestamp macro plus a data type conversion function (see dbt-labs/dbt-utils#586).
  2. We have a convert_timezone macro, defined in dbt-core and implemented by each adapter plugin. @clausherther has already written one for the dbt-date packageβ€”Claus, would you have any interest in kicking it upstream? :)
  3. It's trivial to write a current_timestamp_in_utc macro that builds on top of 1+2. Whether that lives in dbt-core, dbt-utils, or somewhere else doesn't matter so much to me. Wherever it lives, it should look a lot like dbt_date.now(tz=None). We'd leave the dbt_utils one in place for backwards compatibility.

Questions

Describe alternatives you've considered

Leaving all of it where it is :)

Additional context

Timezones are no fun

This stuff is pretty tricky to test. We're not really testing it currently. I had an idea in dbt-labs/dbt-utils#577 (comment), but it's not a great one. We are indirectly testing our use of snapshot_get_time in dbt-core functional tests (specifically test_hard_delete_snapshot.

Who will this benefit?

  • Users who want to call well-named macros, and expect them to do the thing they're meant to
  • Claus, as maintainer of the dbt-date package (I hope!)

I have no idea what this means in terms of actual work, but I'm in. πŸ‘

Couple of unordered thoughts on convert_timezone from dbt_date:

  • Timezone conversion at a minimum needs a target timezone. In dbt_date, we currently require a default project timezone via a global var, e.g. "dbt_date:time_zone": "America/Los_Angeles". That way something like dbt_date.now() by default always returns a "local" time.
  • Some platforms also allow you to specify a source timezone. So, for example, you can convert from something other than UTC to some other non-UTC timezone. So, convert_timezone has both target_tz and source_tz parameters, both optional.
  • In dbt-date, we effectively strip off the tz bundle on Snowflake by casting to a timestamp_ntz. I've found that to be the most sensible, since it IMO it better preserves the converted time.

Converting to UTC and then casting to timestampntz (datetime in BQ) generally seems like the Way To Go. As analytical practice, it makes sense to pull out the timezone and store as a separate column (local_timezone). Relevant: dbt-labs/docs.getdbt.com#1504

I'm going to transfer this to dbt-core. I think the real blocker there is around understanding + consolidating the different options there. I also want to keep the follow-on work from the cross-db migration organized in one place.

Just learned a couple of things on Snowflake:

  • The 3 parameter version of convert_timezone that allows you to specify a source timezone does not accept a timestamp_tz column (any more?). So, one has to convert to timestamp_ntz before passing that in.
  • If your SF instance/account/session/... is to set to timezone X and you're trying to convert what you think is a timestamp in timezone Z to timezone X, nothing will actually get converted, b/c SF assumes source and target are the same (and you just stripped off the tz 😞 )

Added a comparison of dbt-core+adapters implementation vs. dbt-utils in this comment.

TLDR: they don't appear to yield the same data types.

@Fleid ☝️ this is the ticket that we were discussing earlier today about inconsistent timestamp data types across adapters.

@dbeatty10 Amazing work pulling together all the threads from your research!

I think there are two potential scopes for this work:

  1. Do just the work that's needed to remove dbt_utils.current_timestamp ahead of cutting dbt-utils v1.0
  2. Actually reconcile the divergent implementations of getting current timestamp in dbt-core + adapters (also including date_function + snapshot_get_time)

The table you created (and included as a screenshot in your comment) really clinches it for me. We probably don't have the time, capacity, or ability to totally rationalize our approach to current_timestamp without risking major breaking changes β€” we might need to wait for dbt v2 for the really good version of this. I'm going to call option 2 "perfect," and option 1 "better."

GOAL: Find a path that can unlock "better" (unblocking dbt-utils v1.0), even though it will not get us to "perfect" (total consistency within dbt-core + adapters) β€” without breaking people's projects, and without adding a ton of additional tech debt (more inconsistency) to dbt-core + adapters in the process.

Here's what I think we COULD do now, to unblock step 1:

  • Move all the "active" and adapter-specific code from dbt-utils into dbt-core. Just leave dispatching macros from dbt-utils β†’ dbt-core for v0.9, and then remove for v1.0.
  • Find the least-bad and least-breaking way of organizing that code in dbt-core. If it requires a current_timestamp_backcompat macro, just for Snowflake/Postgres (where dbt_utils.current_timestamp != dbt.current_timestamp), so be it β€” I think it's better to have all the inconsistent code in one place, and clearly documented.
  • We have multiple implementations hang out in dbt-core for some time, BUT we develop some crystal-clear guidance on what users should actually do when they want a cross-db compatible current_timestamp. That should be with documentation and guidance (dbt-labs/docs.getdbt.com#1504), and also {# DEPRECATED: DO NOT USE IN NEW PROJECTS #} inline in the code.

I'm poking around with the code a bit to see if that's possible. I'm not sure yet β€” I'll open a branch if I get there.

I haven't had a chance to actually test any of this yet β€” just sharing the links here to share how I'm thinking about the narrower path I outlined above:

Motivation: If there's going to be inconsistency β€” feels unavoidable unless we break things β€” better to have it all in one place.

Awesome to see your thinking on this @jtcohen6 !

Agreed that inconsistency is unavoidable unless we break things. Also agreed that it is better to preserve that inconsistency rather than introducing breaking changes before dbt v2.

What's the advantage of adding current_timestamp_backcompat to dbt-core?

Is it so that users of dbt_utils >= 1.0.0 can update their references from {{ dbt_utils.current_timestamp() }} to {{ current_timestamp_backcompat() }} (assuming the end user has dbt Core >= 1.3.0 installed)?

If that is the case, how is that better for users rather than leaving {{ dbt_utils.current_timestamp() }} as-is?

Is it so that users of dbt_utils >= 1.0.0 can update their references from {{ dbt_utils.current_timestamp() }} to {{ current_timestamp_backcompat() }} (assuming the end user has dbt Core >= 1.3.0 installed)?

Exactly my thinking.

If that is the case, how is that better for users rather than leaving {{ dbt_utils.current_timestamp() }} as-is?

If there's going to be inconsistency, I'd rather have it all in one place!

Right now, it's spread across dbt-core, adapter plugins, and dbt-utils. By moving from dbt-utils into dbt-core / adapters, entropy may not actually be reduced, but it is more concentrated.

And we can make clear that the backcompat is just there for that reason β€” backward compatibility β€” and shouldn't be used by anyone new. That doesn't feel clear if we keep dbt_utils.current_timestamp around. This feels directionally correct toward having a clear + concise recommendation for what to do going forward, even if we need to keep some other crufty stuff around.

Here is where I ended up at for dbt-core changes: https://github.com/dbt-labs/dbt-core/pull/5838/files#diff-4ce397ee2492fb1078425ea420483cf3b55645b5bbcee3139ae696ad07c6c1bc

Still need to polish/test everything but it consolidates the dbt-core logic into a single file. Can reconcile our two versions (I don't have the backwards compat).