prometheus/cloudwatch_exporter

Metrics not available in Prometheus

domcar opened this issue · 4 comments

What did you do

I deployed cloudwatch exporter with helm. Everything looks fine, i also get the metrics

# HELP aws_vpn_tunnel_state_average CloudWatch metric AWS/VPN TunnelState Dimensions: [VpnId] Statistic: Average Unit: None
# TYPE aws_vpn_tunnel_state_average gauge
aws_vpn_tunnel_state_average{job="aws_vpn",instance="",vpn_id="vpn-redacted",} 0.0 1675847100000
# HELP aws_rds_freeable_memory_average CloudWatch metric AWS/RDS FreeableMemory Dimensions: [DBInstanceIdentifier] Statistic: Average Unit: Bytes
# TYPE aws_rds_freeable_memory_average gauge
aws_rds_freeable_memory_average{job="aws_rds",instance="",dbinstance_identifier="redacted",} 1.23510784E8 1675847100000
aws_rds_freeable_memory_average{job="aws_rds",instance="",dbinstance_identifier="redacted",} 3.73026816E8 1675847100000
# HELP aws_rds_cpuutilization_average CloudWatch metric AWS/RDS CPUUtilization Dimensions: [DBInstanceIdentifier] Statistic: Average Unit: Percent
# TYPE aws_rds_cpuutilization_average gauge
aws_rds_cpuutilization_average{job="aws_rds",instance="",dbinstance_identifier="redacted,} 17.47470875485408 1675847100000
aws_rds_cpuutilization_average{job="aws_rds",instance="",dbinstance_identifier="redacted",} 17.02471625472909 1675847100000

The problem is that the metrics aren't shown in prometheus, to be more precise, the metrics are available, meaning that prometheus can scrape the target, but the query gives back "Empty results"

Other metrics like for example "cloudwatch_exporter_build_info" are correctly shown in prometheus. Could the reason be that those values are gauge but actually there are two values instead of one? Can I change this?

Environment

  • Exporter version: 0.15.0

Exporter configuration file

expand
config: |-
  region: us-east-2
  period_seconds: 240
  metrics:
  - aws_dimensions: [VpnId]
    aws_metric_name: TunnelState
    aws_namespace: AWS/VPN
    aws_statistics: [Average]
  - aws_dimensions: [DBInstanceIdentifier]
    aws_metric_name: FreeableMemory
    aws_namespace: AWS/RDS
    aws_statistics: [Average]
  - aws_dimensions: [DBInstanceIdentifier]
    aws_metric_name: CPUUtilization
    aws_namespace: AWS/RDS
    aws_statistics: [Average]

The metrics will be visible in Prometheus if you look >10 minutes in the past (try the graph view).

This is an unfortunate result of a fundamental mismatch between CloudWatch and Prometheus. CW metrics converge over time, that is, the value at time T can change up to some later time T+dT. Meanwhile, Prometheus assumes that once it has scraped a sample, that is the truth, and the past does not change.

To compensate for this, by default the exporter delays fetching metrics, that is, it only asks for data 10 minutes later, when almost all AWS services have converged. It also reports to Prometheus that this sample is from the past. Because Prometheus, for an instant request, only looks back 5 minutes, it never sees any data "now".

@matthiasr I actually solved the problem by setting this to false in the configuration
set_timestamp: false

That works but keep in mind that metrics will show up shifted by 10 minutes (or whatever you configured delay_seconds to be). That is, if your database CPU utilization spikes at 11:40, the spike will show up around 11:50 in Prometheus which can make debugging difficult. Depending on the particular metrics you collect, you may be able to get away with a lower delay, this depends on the concrete AWS service and even metric.

Thank a lot for the tip