Netflix/servo

What happens when BucketTimer and BucketConfig time units are different?

seh opened this issue · 3 comments

seh commented

Class com.netflix.servo.monitor.BucketTimer offers a constructor that accepts both a BucketConfig and a TimeUnit. When not specified via the alternate public constructor, this time unit defaults to TimeUnit#MILLISECONDS.

Class com.netflix.servo.monitor.BucketConfig uses a builder to construct instances. By default, the bucket widths are measured with TimeUnit#MILLISECONDS, but the BucketConfig$Builder#withTimeUnit() method allows callers to customize the unit.

Given that, what's supposed to happen when my BucketConfig's time unit is different from my BucketTimer's unit? I expected that the timer would convert recorded values into the buckets' unit, but I don't see this happening in the code. BucketTimer's private constructor names the bucket tags with their time unit, but other than that it ignores the unit, presuming that the unit must be equivalent to its own timeUnit field's value.

Is this the intended behavior? If so, I'd like to see one of the two behavioral changes:

  • Throw an IllegalArgumentException exception if BucketTimer.timeUnit != bucketConfig.getTimeUnit().
  • Log a warning and reset BucketTimer#timeUnit to match bucketConfig.getTimeUnit()'s value.

If the intended behavior was to convert the recorded duration into the buckets' units, then we need an additional call to TimeUnit#convert() in BucketTimer#record().

There are two aspects for the unit, presentation and correctness. If I understand correctly you are saying the comparison is using the wrong unit if the buckets and timer unit are different. That should definitely get fixed. For the options presented we generally prefer logging. Many users don't anticipate the monitoring calls failing and we have had some bad experiences with prod issues that were caused or exacerbated because an exception started getting triggered in a rarely used code path.

For the presentation, we don't particularly care if they are the same. I always use seconds for the timer unit when reported. It makes it much easier to reason about the data if everything is using the same units and I generally prefer base units where possible. With the bucket labels they are used for presentation and so typically need to be set to a unit that makes sense for the size of the buckets.

seh commented

If I understand correctly you are saying the comparison is using the wrong unit if the buckets and timer unit are different. That should definitely get fixed.

Yes, that's my interpretation of the code. If you follow it through the following path, you'll see the problem:

  • BucketTimer#start() returns a TimedStopwatch.

  • TimedStopwatch#stop() calls on BucketTimer#record(long, TimeUnit), passing TimeUnit#NANOSECONDS.

  • BucketTimer#record(long, TimeUnit) calls on BucketTimer#record(long) into a duration of the same unit supplied to the BucketTimer constructor, stored as field BucketTimer#timeUnit.

    Let's assume that this is TimeUnit#MILLISECONDS.

  • BucketTimer#record(long) takes this duration (magnitude) and figures out within which bucket boundary it fits.

    If you assume that the bucket boundaries are not expressed in milliseconds (which the separate constructors and builders allow, if not encourage), then this mapping from the millisecond magnitude into the bucket is senseless, comparing magnitudes of different units.

seh commented

Thank you. The fix makes sense.