[metrics]: Can't get actual value for NumberOfEmptyReceives
80kk opened this issue · 2 comments
- AWS service: SQS
- CloudWatch namespace: SQS
- Link to metrics documentation for this service:
- AWS region of the exporter: eu-west-1
- AWS region of the service: eu-west-1
config: |-
region: eu-west-1
period_seconds: 240
metrics:
- aws_dimensions:
- QueueName
aws_metric_name: NumberOfMessagesSent
aws_namespace: AWS/SQS
aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
- aws_dimensions:
- QueueName
aws_metric_name: NumberOfMessagesReceived
aws_namespace: AWS/SQS
aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
- aws_dimensions:
- QueueName
aws_metric_name: NumberOfEmptyReceives
aws_namespace: AWS/SQS
aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
- aws_dimensions:
- QueueName
aws_metric_name: NumberOfMessagesDeleted
aws_namespace: AWS/SQS
aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
- aws_dimensions:
- QueueName
aws_metric_name: ApproximateNumberOfMessagesDelayed
aws_namespace: AWS/SQS
aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
- aws_dimensions:
- QueueName
aws_metric_name: ApproximateAgeOfOldestMessage
aws_namespace: AWS/SQS
aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
- aws_dimensions:
- QueueName
aws_metric_name: ApproximateNumberOfMessagesNotVisible
aws_namespace: AWS/SQS
aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
- aws_dimensions:
- QueueName
aws_metric_name: ApproximateNumberOfMessagesVisible
aws_namespace: AWS/SQS
aws_statistics: [Sum, SampleCount, Minimum, Maximum, Average]
For some reason using
sum(aws_sqs_number_of_empty_receives_sum{queue_name=~".*"}) by (queue_name)`
or any other combination of sample_count
, min max etc I can't get the value I am seeing in CloudWatch. The best I can make from it is doubled number of actual value. To get the real value I need to do something like this:
max by(queue_name) (aws_sqs_number_of_empty_receives_sum{queue_name=~".*"})/2
What I am doing wrong here?
What is your delay_seconds
? Looking at the SQS CloudWatch docs, there is a pretty severe delay (15 minutes) on metrics from inactive queues. CloudWatch and Prometheus have a fundamental mismatch here in that Prometheus assumes samples are immutable, while CloudWatch mutates them, sometimes for quite a long time.
My suspicion is that some of these empty receives hit an inactive queue, and then Prometheus never sees that sample after CloudWatch finally updates it.
That was it.