prometheus/cloudwatch_exporter

[metrics]: Can't get actual value for NumberOfEmptyReceives

80kk opened this issue · 2 comments

80kk commented
  • AWS service: SQS
  • CloudWatch namespace: SQS
  • Link to metrics documentation for this service:
  • AWS region of the exporter: eu-west-1
  • AWS region of the service: eu-west-1
config: |-
  region: eu-west-1
  period_seconds: 240
  metrics:
  - aws_dimensions:
    - QueueName
    aws_metric_name: NumberOfMessagesSent
    aws_namespace: AWS/SQS
    aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
  - aws_dimensions:
    - QueueName
    aws_metric_name: NumberOfMessagesReceived
    aws_namespace: AWS/SQS
    aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
  - aws_dimensions:
    - QueueName
    aws_metric_name: NumberOfEmptyReceives
    aws_namespace: AWS/SQS
    aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
  - aws_dimensions:
    - QueueName
    aws_metric_name: NumberOfMessagesDeleted
    aws_namespace: AWS/SQS
    aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
  - aws_dimensions:
    - QueueName
    aws_metric_name: ApproximateNumberOfMessagesDelayed
    aws_namespace: AWS/SQS
    aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
  - aws_dimensions:
    - QueueName
    aws_metric_name: ApproximateAgeOfOldestMessage
    aws_namespace: AWS/SQS
    aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
  - aws_dimensions:
    - QueueName
    aws_metric_name: ApproximateNumberOfMessagesNotVisible
    aws_namespace: AWS/SQS
    aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
  - aws_dimensions:
    - QueueName
    aws_metric_name: ApproximateNumberOfMessagesVisible
    aws_namespace: AWS/SQS
    aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]

For some reason using

sum(aws_sqs_number_of_empty_receives_sum{queue_name=~".*"}) by (queue_name)`

or any other combination of sample_count, min max etc I can't get the value I am seeing in CloudWatch. The best I can make from it is doubled number of actual value. To get the real value I need to do something like this:

max by(queue_name) (aws_sqs_number_of_empty_receives_sum{queue_name=~".*"})/2

What I am doing wrong here?

What is your delay_seconds? Looking at the SQS CloudWatch docs, there is a pretty severe delay (15 minutes) on metrics from inactive queues. CloudWatch and Prometheus have a fundamental mismatch here in that Prometheus assumes samples are immutable, while CloudWatch mutates them, sometimes for quite a long time.

My suspicion is that some of these empty receives hit an inactive queue, and then Prometheus never sees that sample after CloudWatch finally updates it.

80kk commented

That was it.