dbt-labs/dbt_metrics

[SEMANTIC-144] [Feature] New calculation_method which allows to define custom sql for metrics aggregations in the same .yml file

Closed this issue · 1 comments

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

After using the custom aggregations creation as proposed by dbt, we have got to the following conclusion:

  • If we create a custom aggregation for every specific case it is needed, the number of possible aggregations would vastly increase, and, at some point, could perfectly diverge from the native core aggregations list.

As discussed in a slack thread, to deal with this, we propose to create a custom sql calculation_method type which would give flexibility to the user in order to define the sql metric aggregation in the same yml file. Imagine a metric which would calculate the daily average, its ideal yml file could be something as follows:

name: daily_average_logins
label: Daily Average Logins
calculation_method: custom_sql
sql_aggregation: count(property_to_aggregate)/count(distinct metric_date_day)
expression: user_id
model: fct_user_login
timestamp: login_at
time_grains: [day, week, month, quarter, year]

With this new calculation_method type, we are giving more flexibility to users and preventing that the number of available aggregations in the gen_primary_metric_aggregate macro increases exponentially.

Describe alternatives you've considered

Create a custom aggregation type for every specific case it is needed.

Who will this benefit?

Everyone

Are you interested in contributing this feature?

Maybe

Anything else?

No response

Hey @josepfranquetf ! Appreciate you opening the issue - it's always helpful to hear from members in the community and what they're dealing with.

Unfortunately I don't believe this is something that we would be likely to implement for a few reasons:

  • We believe that the power of the semantic layer lies in the fact that it sits on top of dbt's modeling framework. For complicated metric calculations we actually recommend pushing down as much of the logic as possible to the model and then using the metric object just for the aggregation.
  • The introduction of derived metrics gives users flexibility in building metrics atop one another. It's not perfect yet but we believe that this addresses a large portion of use cases that would need custom sql.
  • One of our key design philosophies is that metrics are a tightly constrained, bounded experience. Like I mentioned in the thread, flexibility is important as the semantic layer evolves but we always want to make sure that the guard rails are in place to ensure that it is a seamless experience.

I'm happy to keep the issue open for individuals to give feedback but wanted to be transparent around the work that we'd work with y'all to get merged.

Side Note: I think in your example you'd be able to use the average calculation_method (or count distinct user_id) and restrict to the day time grain. Then you wouldn't need the custom sql!