Histogram buckets are not summed when converted to prometheus v2
tbolon opened this issue · 4 comments
In the otel specification, it is stated that buckets are not cumulative : "The sum of the bucket_counts must equal the value in the count field."
// bucket_counts is an optional field contains the count values of histogram
// for each bucket.
//
// The sum of the bucket_counts must equal the value in the count field.
//
// The number of elements in bucket_counts array must be by one greater than
// the number of elements in explicit_bounds array.
repeated fixed64 bucket_counts = 6;
In your example in the collector-contrib repo you seem to expect that the buckets are cumulative (the values are always higher on each successive bucket), on both v1 and v2 example.
http_request_duration_seconds 0.05=24054,0.1=33444,0.2=100392,0.5=129389,1=133988,sum=53423,count=144320
rpc_duration_seconds 0.01=3102,0.05=3272,0.5=4773,0.9=9001,0.99=76656,sum=1.7560473e+07,count=2693
prometheus,le=0.05 http_request_duration_seconds_bucket=24054
prometheus,le=0.1 http_request_duration_seconds_bucket=33444
prometheus,le=0.2 http_request_duration_seconds_bucket=100392
prometheus,le=0.5 http_request_duration_seconds_bucket=129389
prometheus,le=1 http_request_duration_seconds_bucket=133988
prometheus http_request_duration_seconds_count=144320,http_request_duration_seconds_sum=53423
prometheus,quantile=0.01 rpc_duration_seconds=3102
prometheus,quantile=0.05 rpc_duration_seconds=3272
prometheus,quantile=0.5 rpc_duration_seconds=4773
prometheus,quantile=0.9 rpc_duration_seconds=9001
prometheus,quantile=0.99 rpc_duration_seconds=76656
prometheus rpc_duration_seconds_count=1.7560473e+07,rpc_duration_seconds_sum=2693
The prometheus exporter takes care of summing buckets before sending them to prometheus.
for _, bucket := range buckets {
index := indicesMap[bucket]
var countPerBucket uint64
if len(ip.ExplicitBounds()) > 0 && index < len(ip.ExplicitBounds()) {
countPerBucket = ip.BucketCounts()[index]
}
// **** here ****
cumCount += countPerBucket
points[bucket] = cumCount
}
I don't see a cumulative sum in the prometheus_v2 exporter, so I suppose the data will be wrong in influxdb: each bucket will only have its own values, and not include the count from lower buckets.
for i, explicitBound := range explicitBounds {
t := make(map[string]string, len(tags)+1)
for k, v := range tags {
t[k] = v
}
f := make(map[string]interface{}, len(fields)+1)
for k, v := range fields {
f[k] = v
}
boundTagValue := strconv.FormatFloat(explicitBound, 'f', -1, 64)
t[common.MetricHistogramBoundKeyV2] = boundTagValue
f[measurement+common.MetricHistogramBucketSuffix] = float64(bucketCounts[i])
if err = w.WritePoint(ctx, common.MeasurementPrometheus, t, f, ts, common.InfluxMetricValueTypeHistogram); err != nil {
return fmt.Errorf("failed to write point for histogram: %w", err)
}
}
Please note that I didn't check prometheus_v1, but I suspect the problem to be the same.
Updated link from the proto definition cited above:
// bucket_counts is an optional field contains the count values of histogram
// for each bucket.
//
// The sum of the bucket_counts must equal the value in the count field.
//
// The number of elements in bucket_counts array must be by one greater than
// the number of elements in explicit_bounds array.
repeated fixed64 bucket_counts = 6;
I've confirmed that this is a bug. Not sure how I missed this detail for almost two years!
I'll keep the issue open until the fixed behavior is merged into otel collector contrib
This fix should land in the next release of OpenTelemetry Collector Contrib, which should happen within two weeks.