dbt-labs/dbt_metrics

[Feature] Allow for custom-defined metrics types

Closed this issue · 5 comments

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Note: I am creating this issue following a Slack thread during 2022-08-04 dbt Staging.

Currently there is a defined list of type values that may be used for metrics. I assume that over time more will be added, like median. I support adding various common, standard types.

BUT, it would be nice if users could define their own custom types. That would make it super extensible, and then dbt devs wouldn't be on the hook to anticipate every scenario. A custom type would likely be very specific to an org's business context, so it makes sense for development to be in the hands of the user rather than relying on a finite list.

I hypothesize that while any individual custom definition of a type could be considered fringe or rare, in aggregate, the various fringe types could constitute a significant proportion of metrics that users need to implement.

Describe alternatives you've considered

dbt devs greatly expand the type options, while still inevitably failing to anticipate all user needs.

Who will this benefit?

  • Users who want to develop a metric type that has not yet been conceived of and/or is not one that should be made available to the broader community, most likely because it is a fringe use case.
  • dbt devs who don't have to do as much work building out type options.

Are you interested in contributing this feature?

Possibly! I don't know enough about the mechanics of type or metrics overall to say if I'd be the best person to help with this, but it seems possible that I could be helpful.

Anything else?

No response

Hey @leoebfolsom so I was actually going to respond to the original slack thread but this is also a good place. Turns out @joellabes is way ahead of us with this section from README.


Custom aggregations

To create a custom primary aggregation (as exposed through the type config of a metric), create a macro of the form metric_my_aggregate(expression), then override the gen_primary_metric_aggregate() macro to add it to the dispatch list. The package also protects against nonsensical secondary calculations such as an average of an average; you will need to override the get_metric_allowlist() macro to both add your new aggregate to to the existing aggregations' allowlists, and to make an allowlist for your new aggregation:

    {% do return ({
        "average": ['max', 'min'],
        "count": ['max', 'min', 'average', 'my_new_aggregate'],
        [...]
        "my_new_aggregate": ['max', 'min', 'sum', 'average', 'my_new_aggregate']
    }) %}

To create a custom secondary aggregation (as exposed through the secondary_calculations parameter in the metric macro), create a macro of the form secondary_calculation_my_calculation(metric_name, dimensions, calc_config), then override the perform_secondary_calculations() macro.

@callum-mcdata I see! So without entirely tracking the steps that one would take, it's possible to create a new type such as median this way? And then, the user be responsible for keeping their version of gen_primary_metric_aggregate() up to date with the latest from the dbt_metrics package, so that you could have both your custom metric and any new metrics or other updates that exist in the latest dbt_metrics version of gen_primary_metric_aggregate()?

Or maybe there's a way of of creating a new macro that always pulls in the latest from gen_primary_metric_aggregate() and then adds custom logic to it ... 🤔

Anyway, it sounds like the core functionality of adding custom types is possible, and if so, I'd close this issue.

@leoebfolsom that first description is correct! Just like the overriding of other dbt macros for things like namespacing, the user would be responsible to keep it updated. But it gives you the flexibility to add any metric aggregation that you are interested in (within reasonable limits).

I mention the reasonable limits because more complicated aggregations like window functions would probably fail due to the way we construct the sql but I haven't extensively tested this issue.

I'll close out this issue because the functionality exists but if you have any questions feel free to ping me or we can open up another issue!

Or maybe there's a way of of creating a new macro that always pulls in the latest from gen_primary_metric_aggregate() and then adds custom logic to it ... 🤔

in this case, not actually that impossible!

I think what you'd wind up doing is adding a new macro, gen_custom_primary_metric_aggregate(aggregate, expression) which would be called in the else block here: https://github.com/dbt-labs/dbt_metrics/blob/main/macros/sql_gen/gen_primary_metric_aggregate.sql#L26-L31

Our out of the box implementation would be to throw the same exception we do now, but someone could override it to have their own dispatches ahead of that final block.

In this way, users would stay up to date with any new aggregations we build in natively.

Having just said that, it's probably a bad idea - if you build your own median macro, and then we add our own with the same name, our new one will take precedence over your one because it will appear earlier in the switch statement. Perhaps we invoke custom ones at the top of the file. It'd need to return the SQL string (or a success bool) and we'd check its length to know whether we can early exit or need to keep running through the list of builtins. Not impossible, but perhaps a bit fragile.

Or we stick with the pattern defined everywhere else in dbt, which is that if you override a builtin macro, you're on the hook for bringing in desirable changes in the future.

@joellabes thank you for going through this, it makes sense! I think if we added custom macros, we'd call them median_custom_companyname or something to avoid the exact issue you mentioned. A little hacky, but feels like a reasonable practice to avoid collisions while still always inheriting the latest and greatest from the main package.