cloudspannerecosystem/autoscaler

Duplicate TimeSeries errors from opentelemetry Collector

nielm opened this issue · 1 comments

The following errors are being reported from the OpenTelemetry collector in GKE decoupled mode

textPayload: "2024-03-19T09:47:21.215Z	error	exporterhelper/common.go:95	
Exporting failed. Dropping data.	
{
  "kind": "exporter",
  "data_type": "metrics",
  "name": "googlecloud",
  "error": "rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[6] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.; Field timeSeries[8] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.; Field timeSeries[7] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.; Field timeSeries[5] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.; Field timeSeries[9] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.\nerror details: name = Unknown  desc = total_point_count:10 success_point_count:5 errors:{status:{code:3} point_count:5}",
  "dropped_items": 10}"

Some analysis later...

The scaler instances were occasionally sending metrics to the OpenTelemetry Collector more frequently than the batching interval of the collector.

The collector does not aggregate these metrics when batching so was sending multiple Scaler metrics from the same pod in the same CreateTimeSeries request.

Solution:

  • In decoupled Scaler:
    • Do not manually flush the metrics
    • In OTEL mode, ensure that the periodic export interval is greater than the batching interval of the OpenTelemetry collector