dbt-labs/dbt_metrics

Handling Non-Time Metric Aggregations

Closed this issue · 2 comments

Problem Statement

We've now heard from multiple partners that sometimes they want to return a single number representing a metric. They don't need it to be listed by day, week, month, etc - they want the single numerical representation for that business concept.

And we can't do that right now. With time_grain and timestamp being required fields for a metric definition, we currently only allow for returned datasets to be broken down by some sort of time aggregation. The partners could hypothetically pull the most recent value for the metric but that only works in some cases. Cases where a multi-month aggregation, or even multi-year aggregation, of a metric cannot be handled with current functionality.

Feature Description

As I see it, there are two options as we move forward:

  1. Re-architect the metric node to no longer require a timestamp and time-grain field. We would then change the logic in the metrics package to be able to handle when these are missing and still return the correct logic.
    • There still exists the possibility that sometimes a metric with a timestamp would be needed to aggregated over all time.
  2. Add support for the all type in time_grains. This would aggregate over all time and return the metric value. However, this still has implementation questions
    • Do we return a date field with the metric? My gut instinct is no but there could be some shenanigans around what time range really constitutes all

In my opinion, the perfect solutions is a combination of the two of them. We should support all, or some alternative name, to allow for use cases that aren't focused around time-series data. Additionally, we should consider re-architecting the metric node to allow for non-timestamped metrics, which is an area we might run into.

How Does This Relate To The Vision

This is a great issue for us to have a broader discussion on what the future of the semantic layer looks like. Is it a permissive framework that gives users as much flexibility as they need or is it a tightly woven experience around an opinionated stance of what metrics are and the best practices associated with them?

Note that in the original issue, Drew came around on optional timestamps which solves some of the foundational intent stuff.

I lean closer to option 2 - I would strongly prefer that we require a grain, to be explicit about what we're giving back (I'd rather fail audibly than quietly do the wrong thing, if someone forgets to include the grain on one of their time-bound metrics). It wouldn't make sense to have a timestamp column in that case which is OK - missing timestamp is allowed if the time_grain is all or whatever we call it (perhaps now?!)

I think we'd only want to return a date field if we wanted to provide a consistently shaped response, but I don't think that's a priority. Ensuring that the right timeframe is in scope on the source table would be the responsibility of the person adding the metric

Closing because I duplicated in #97 and that one has much better implementation information