[CT-895] [Spike] Consolidate current_timestamp & associates
jtcohen6 opened this issue Β· 12 comments
Split out from dbt-labs/dbt-utils#577 + dbt-labs/dbt-utils#597, resolves dbt-labs/dbt-utils#339
Describe the feature
Rewrite current_timestamp
+ current_timestamp_utc
to be built out of core/plugin functionality (e.g. date_function
) and a convert_timezone
macro.
Currently in dbt-core
, there are three methods/macros all purporting to do the same thing. I've linked the examples from Postgres.
date_function
(Python classmethod)snapshot_get_time
(macro used for snapshots)current_timestamp
(macro used nowhere, except as the default implementation ofsnapshot_get_time
β but often the two are different, mostly for data type reasons)
Then, there are the two versions currently living in dbt-utils
:
Things we need to figure out
- Are the different implementations here functionally identical? (Doug's initial finding was "yes": dbt-labs/dbt-utils#597 (comment))
- If the data types are different, why?
- In what cases would their functional result differ? I'm imagining a case where Snowflake parameters
TIMEZONE
andTIMESTAMP_TYPE_MAPPING
are both set
Ideal outcome
When I close my eyes and imagine how this could be really good:
- We have one method/macro, defined in
dbt-core
and implemented by each adapter plugin, that provides the SQL for getting the current timestamp. Whenever we need the current timestamp in a different data type, we use thatcurrent_timestamp
macro plus a data type conversion function (see dbt-labs/dbt-utils#586). - We have a
convert_timezone
macro, defined indbt-core
and implemented by each adapter plugin. @clausherther has already written one for thedbt-date
packageβClaus, would you have any interest in kicking it upstream? :) - It's trivial to write a
current_timestamp_in_utc
macro that builds on top of 1+2. Whether that lives indbt-core
,dbt-utils
, or somewhere else doesn't matter so much to me. Wherever it lives, it should look a lot likedbt_date.now(tz=None)
. We'd leave thedbt_utils
one in place for backwards compatibility.
Questions
Describe alternatives you've considered
Leaving all of it where it is :)
Additional context
Timezones are no fun
This stuff is pretty tricky to test. We're not really testing it currently. I had an idea in dbt-labs/dbt-utils#577 (comment), but it's not a great one. We are indirectly testing our use of snapshot_get_time
in dbt-core
functional tests (specifically test_hard_delete_snapshot
.
Who will this benefit?
- Users who want to call well-named macros, and expect them to do the thing they're meant to
- Claus, as maintainer of the
dbt-date
package (I hope!)
I have no idea what this means in terms of actual work, but I'm in. π
Couple of unordered thoughts on convert_timezone
from dbt_date
:
- Timezone conversion at a minimum needs a target timezone. In
dbt_date
, we currently require a default project timezone via a globalvar
, e.g."dbt_date:time_zone": "America/Los_Angeles"
. That way something likedbt_date.now()
by default always returns a "local" time. - Some platforms also allow you to specify a source timezone. So, for example, you can convert from something other than UTC to some other non-UTC timezone. So,
convert_timezone
has bothtarget_tz
andsource_tz
parameters, both optional. - In
dbt-date
, we effectively strip off thetz
bundle on Snowflake by casting to atimestamp_ntz
. I've found that to be the most sensible, since it IMO it better preserves the converted time.
Converting to UTC and then casting to timestampntz
(datetime
in BQ) generally seems like the Way To Go. As analytical practice, it makes sense to pull out the timezone and store as a separate column (local_timezone
). Relevant: dbt-labs/docs.getdbt.com#1504
I'm going to transfer this to dbt-core
. I think the real blocker there is around understanding + consolidating the different options there. I also want to keep the follow-on work from the cross-db migration organized in one place.
Just learned a couple of things on Snowflake:
- The 3 parameter version of convert_timezone that allows you to specify a source timezone does not accept a
timestamp_tz
column (any more?). So, one has to convert totimestamp_ntz
before passing that in. - If your SF instance/account/session/... is to set to timezone X and you're trying to convert what you think is a timestamp in timezone Z to timezone X, nothing will actually get converted, b/c SF assumes source and target are the same (and you just stripped off the tz π )
Added a comparison of dbt-core+adapters implementation vs. dbt-utils in this comment.
TLDR: they don't appear to yield the same data types.
@Fleid βοΈ this is the ticket that we were discussing earlier today about inconsistent timestamp data types across adapters.
@dbeatty10 Amazing work pulling together all the threads from your research!
I think there are two potential scopes for this work:
- Do just the work that's needed to remove
dbt_utils.current_timestamp
ahead of cutting dbt-utils v1.0 - Actually reconcile the divergent implementations of getting current timestamp in
dbt-core
+ adapters (also includingdate_function
+snapshot_get_time
)
The table you created (and included as a screenshot in your comment) really clinches it for me. We probably don't have the time, capacity, or ability to totally rationalize our approach to current_timestamp
without risking major breaking changes β we might need to wait for dbt v2 for the really good version of this. I'm going to call option 2 "perfect," and option 1 "better."
GOAL: Find a path that can unlock "better" (unblocking dbt-utils v1.0), even though it will not get us to "perfect" (total consistency within dbt-core
+ adapters) β without breaking people's projects, and without adding a ton of additional tech debt (more inconsistency) to dbt-core
+ adapters in the process.
Here's what I think we COULD do now, to unblock step 1:
- Move all the "active" and adapter-specific code from
dbt-utils
intodbt-core
. Just leave dispatching macros fromdbt-utils
βdbt-core
for v0.9, and then remove for v1.0. - Find the least-bad and least-breaking way of organizing that code in
dbt-core
. If it requires acurrent_timestamp_backcompat
macro, just for Snowflake/Postgres (wheredbt_utils.current_timestamp != dbt.current_timestamp
), so be it β I think it's better to have all the inconsistent code in one place, and clearly documented. - We have multiple implementations hang out in
dbt-core
for some time, BUT we develop some crystal-clear guidance on what users should actually do when they want a cross-db compatiblecurrent_timestamp
. That should be with documentation and guidance (dbt-labs/docs.getdbt.com#1504), and also{# DEPRECATED: DO NOT USE IN NEW PROJECTS #}
inline in the code.
I'm poking around with the code a bit to see if that's possible. I'm not sure yet β I'll open a branch if I get there.
I haven't had a chance to actually test any of this yet β just sharing the links here to share how I'm thinking about the narrower path I outlined above:
Motivation: If there's going to be inconsistency β feels unavoidable unless we break things β better to have it all in one place.
Awesome to see your thinking on this @jtcohen6 !
Agreed that inconsistency is unavoidable unless we break things. Also agreed that it is better to preserve that inconsistency rather than introducing breaking changes before dbt v2.
What's the advantage of adding current_timestamp_backcompat
to dbt-core?
Is it so that users of dbt_utils >= 1.0.0 can update their references from {{ dbt_utils.current_timestamp() }}
to {{ current_timestamp_backcompat() }}
(assuming the end user has dbt Core >= 1.3.0 installed)?
If that is the case, how is that better for users rather than leaving {{ dbt_utils.current_timestamp() }}
as-is?
Is it so that users of dbt_utils >= 1.0.0 can update their references from
{{ dbt_utils.current_timestamp() }}
to{{ current_timestamp_backcompat() }}
(assuming the end user has dbt Core >= 1.3.0 installed)?
Exactly my thinking.
If that is the case, how is that better for users rather than leaving
{{ dbt_utils.current_timestamp() }}
as-is?
If there's going to be inconsistency, I'd rather have it all in one place!
Right now, it's spread across dbt-core
, adapter plugins, and dbt-utils
. By moving from dbt-utils
into dbt-core
/ adapters, entropy may not actually be reduced, but it is more concentrated.
And we can make clear that the backcompat is just there for that reason β backward compatibility β and shouldn't be used by anyone new. That doesn't feel clear if we keep dbt_utils.current_timestamp
around. This feels directionally correct toward having a clear + concise recommendation for what to do going forward, even if we need to keep some other crufty stuff around.
Here is where I ended up at for dbt-core changes: https://github.com/dbt-labs/dbt-core/pull/5838/files#diff-4ce397ee2492fb1078425ea420483cf3b55645b5bbcee3139ae696ad07c6c1bc
Still need to polish/test everything but it consolidates the dbt-core logic into a single file. Can reconcile our two versions (I don't have the backwards compat).