apple/swift-metrics

Timer does not maintain units

Yasumoto opened this issue ยท 20 comments

Expected behavior

When I record a duration taking duration as a certain format, I expect the metric to be exposed in that same unit.

Actual behavior

As described in swift-server/swift-prometheus#10, I'm looking at unexpected behavior where I'm inputting a duration as Seconds, but then the unit is exported as nanoseconds.

I understand why we're storing the value as an Int, but the unit translation is unexpected when it's being displayed.

This is something that maybe we could handle in the implementations but that doesn't feel correct.

Steps to reproduce

  1. Record a duration on a Timer using seconds as the unit. summary.recordSeconds(1.0)
  2. collect the metrics.

If possible, minimal yet complete reproducer code (or URL to code)

https://github.com/MrLotU/SwiftPrometheus/pull/11/files#diff-5b4f613ceb7c7173fa4cddb0c60c9e49R133

SwiftMetrics version/commit hash

master

thanks for the note @Yasumoto, as you point out, one could argue this should be done at the "backend" level if it "cares". for example, statsd does not "care" and always uses milliseconds: https://github.com/b/statsd_spec#timers

would love to hear opinions here

We've observed these combinations in generic metrics API:

  1. back-end requires times to be reported in a fixed unit
  2. back-end is unit-less
    2.1. users want all times to be unified to a single unit
    2.2. users know the units of the different KPIs and interpret them when looking at collected data

Case 1 is easy as the concrete implementation for the back-end knows its unit of choice. For case 2, we had to have an optional construction argument on the concrete implementation to allow specifying the conversion unit if any.

@ktoso wdyt?

@ddossot I think your 2.2 hits the nail on the head for me. In my case, Iโ€™m observing a value in seconds, and the unit change is surprising as a user.

we could change the timer handler API to accept units as well, but that would be a breaking change and require a major version release, so we should be careful here

@ktoso @weissi @MrLotU opinions?

another way, and maybe what @ddossot is alluding to in we had to have an optional construction argument on the concrete implementation to allow specifying the conversion unit if any. is that the metrics backend library offers a way to configure this at the concrete timer constructor, eg let timer = FooTimer(unit: .seconds). another, similar way could be for the backend library to offer configuration options for metrics objects by their label, eg "timer.foo.unit": .seconds (or similar) which you specify when initializing the concrete backend. the config based approach is similar to how many configuration based logging systems work, see an example in https://github.com/apple/swift-log/blob/master/Tests/LoggingTests/TestLogger.swift#L18

we could change the timer handler API to accept units as well, but that would be a breaking change and require a major version release, so we should be careful here

@ktoso @weissi @MrLotU opinions?

So what we could do is that we add a new method to the protocol but provide a default implementation which just removes the units and calls into the old one. Apart from that shim, we'd never call into the old (unit-less) implementation anymore and document it appropriately.

That way we have a migration path to retaining units without breaking SemVer

not sure if it makes sense to capture this at recording time, as its totally legit to use the same timer object to capture in different units. from where i stand, this is more of a question of how the backend reports the measurements. iow code can capture timing in whatever unit, but people want to define some default for presentation. many backends make that decision on their own based on the scale of the measurement, but you can also make that an argument in the constructor of the timer. to that end, we can change TimerHandler ctor or make this a backend API concern via the backend concrete timer ctor or configuration based approach. wdyt?

ktoso commented

Sorry it took me a while to get around here;

I agree that there may be two "types" of users and/or backends, and while I'd argue the "let backend report however it wants" makes the "more sense" I guess those views are a bit subjective, and various backends have their own styles. While always reporting in most fine grained unit available => no information loss, while reporting in seconds => information loss; it also means more noise in the numbers and perhaps post aggregation etc.

It does seem indeed to be a recommended pattern in prometheus for the metric names to be aware about the unit they report, i.e.:

http_request_duration_seconds

So one would want to report those in seconds as @Yasumoto mentions.


Having that said, @ddossot's idea seems quite solid, so we can indeed:

So what we could do is that we add a new method to the protocol but provide a default implementation which just removes the units and calls into the old one. Apart from that shim, we'd never call into the old (unit-less) implementation anymore and document it appropriately.

So this perhaps can work, as Tom suggests?

public protocol TimerHandler: AnyObject {
    init (storageUnit: TimeUnit)
    func recordInStorageUnit(_ duration: Int64, _ recordedInTimeUnit: TimeUnit)
public class Timer {
    @inlinable
    public func recordNanoseconds(_ duration: Int64) {
        self.handler.recordInStorageUnit(duration, .nanoseconds)
    }
    public func recordSeconds<DataType: BinaryInteger>(_ duration: DataType) {
        // ... 
        self.handler.recordInStorageUnit(result, .seconds)
...
}

Though it complicates the code paths a little bit... Would you want to give it a shot at PRing this change @Yasumoto ?

@ktoso @Yasumoto to rehash, in my experience the recording API should be decoupled from the display. iow. a single timer could be used to record in many units, hence the various flavors of recordXXX API. the very valid question this issue raised is what unit do we use to present in case the backend supports such "setting".

=> the proposal to add a init (storageUnit: TimeUnit) could be a solution. the tradeoff is that some backend will need to silently ignore this since they can't support display options, leading to unexpected user experience

=> also not sure i follow why we would need to introduce recordInStorageUnit instead of keeping the recording API recordNanoseconds in place and translating to the display unit on "collect" (to use prometheus terminology). what am i missing?

ktoso commented

the tradeoff is that some backend will need to silently ignore this since they can't support display options, leading to unexpected user experience

That's fair and sounds okey. Like the statsd one would ignore and always emit in ms. Documentation would say it does not support that as well then I suppose.

=> also not sure i follow why we would need to introduce recordInStorageUnit instead of keeping the recording API recordNanoseconds in place and translating to the display unit on "collect" (to use prometheus terminology). what am i missing?

I see, I slightly overdid the proposal there then; You're right, silently storing and always delivering to the backend in nanos, but at least the backend was told what "unit it is", so it can render as seconds if it so wanted to (which prometheus libs seem to like to do).

So: just the added constructor, and the "front" always delivers to backend as ns and the backend can do whatever it wants. Sounds good to me ๐Ÿ‘

@Yasumoto would you like to PR?

So: just the added constructor, and the "front" always delivers to backend as ns and the backend can do whatever it wants. Sounds good to me ๐Ÿ‘

I think this is the nicest solution you can get. You don't lose precission since values are stored as ns and converted later, but a user can still decide themselves if the end data should be seconds, ms, ns or minutes. ๐Ÿ˜„

Just so I'm clear, what do y'all mean by backend? I my comment, I used the term as the "server side component to which a client sends KPIs" but I feel you use it to mean a particular implementation still in the client side.

The reason I'm asking this is that if the backend-aka-server-side expects a particular time unit on the wire, always delivering as ns wouldn't fly.

ktoso commented

Hey, yeah when we say โ€œbackendโ€ in this thread it usually refers to โ€œmetrics library that is specific to some backend but is not the metrics-api but an implementation there ofโ€ if that makes sense. So this project does not actually do anything, it is like slf4j, in the sense that it is only the API. And then thereโ€™s metrics library implementations โ€œbackendsโ€ which actually emit the metrics; be it to Prometheus or statsd or similar.

One thing Iโ€™m not sure about is if this additional constructor will also have to exist on the Timer type... sorry I could not investigate this one more yet.

Awesome, thanks for digging into this, folks! If I can take a few days I'd be happy to give this a whirl. I'll follow the thread on Timer(storageUnit: TimeUnit) and report back!

Because I'd like to publish SwiftPrometheus 1.0 sometime soon, I went ahead and created the PR for this so I can solve this issue before my 1.0 release ๐Ÿ˜„

Thank you so much @MrLotU !! ๐Ÿ†