aws/amazon-cloudwatch-agent

CloudWatch metrics collected from prometheus , contains undesired dimensions

Closed this issue · 3 comments

Describe the bug

My configuration says

logs": {
    "metrics_collected": {
      "prometheus": {
        "cluster_name": "tableau-dp2",
        "log_group_name": "tableau-dp2",
        "prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus.yaml",
        "emf_processor": {
          "metric_declaration_dedup": true,
          "metric_namespace": "CWAgent/Prometheus",
          "metric_unit": {
            "java_lang_memory_heapmemoryusage_used": "Bytes"
          },
          "metric_declaration": [
            {
              "source_labels": ["node"],
              "label_matcher": "*",
              "dimensions": [
                [
                  "ClusterName",
                  "node",
                  "application",
                  "service",
                  "service_instance"
                ]
              ],
              "metric_selectors": [
                "^java_lang_memory_heapmemoryusage_used"
              ]
            }
          ]
        }
      }
    },

which specifies that only the following labels should becom dimensions

  • ClusterName
  • node
  • application
  • service
  • service_instance

but the final cloudwatch log event is

{
    "CloudWatchMetrics": [
        {
            "Namespace": "CWAgent/Prometheus",
            "Dimensions": [
                [
                    "service",
                    "service_instance",
                    "ClusterName",
                    "host",
                    "job",
                    "prom_metric_type",
                    "instance",
                    "node",
                    "application"
                ]
            ],
            "Metrics": [
                {
                    "Name": "java_lang_memory_heapmemoryusage_used",
                    "Unit": "Bytes"
                },
                {
                    "Name": "jmx_scrape_cached_beans"
                },
                {
                    "Name": "jmx_scrape_duration_seconds"
                },
                {
                    "Name": "jmx_scrape_error"
                }
            ]
        }
    ],
    "ClusterName": "tableau-dp2",
    "Timestamp": "1717502587825",
    "Version": "0",
    "application": "Tableau",
    "host": "xxxx",
    "instance": "127.0.0.1:12302",
    "job": "jmx",
    "node": "node1",
    "prom_metric_type": "gauge",
    "service": "vizqlservice",
    "service_instance": "2",
    "java_lang_memory_heapmemoryusage_used": 506484968,
    "jmx_scrape_cached_beans": 0,
    "jmx_scrape_duration_seconds": 0.057368237,
    "jmx_scrape_error": 0
}

as you can see the .CloudWatchMetrics.Dimensions contain additional dimension to the ones I specified:

  • host
  • job
  • prom_metric_type
  • instance

Steps to reproduce
If possible, provide a recipe for reproducing the error.

What did you expect to see?

I expect to see only the dimensions that I specified, or at least have documented somewhere that what dimensions will be "forced" or automatically added

What did you see instead?

I saw the dimensions that I specified **plus 4 other dimensions that I didn't ask for **

What version did you use?
Version: CWAgent/1.300039.0b612 (go1.22.2; linux; amd64)

What config did you use?
config.json


{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "root",
    "debug": true
  },
  "metrics": {
    "aggregation_dimensions": [
      [
        "InstanceId"
      ]
    ],
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
      "ImageId": "${aws:ImageId}",
      "InstanceId": "${aws:InstanceId}",
      "InstanceType": "${aws:InstanceType}"
    },
    "metrics_collected": {
      "collectd": {
        "metrics_aggregation_interval": 60
      },
      "cpu": {
        "measurement": [
          "cpu_usage_idle",
          "cpu_usage_iowait",
          "cpu_usage_user",
          "cpu_usage_system"
        ],
        "metrics_collection_interval": 60,
        "totalcpu": true
      },
      "disk": {
        "measurement": [
          "used_percent",
          "inodes_free"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "/"
        ]
      },
      "diskio": {
        "measurement": [
          "io_time",
          "write_bytes",
          "read_bytes",
          "writes",
          "reads"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      },
      "mem": {
        "measurement": [
          "mem_used_percent"
        ],
        "metrics_collection_interval": 60
      },
      "netstat": {
        "measurement": [
          "tcp_established",
          "tcp_time_wait"
        ],
        "metrics_collection_interval": 60
      },
      "statsd": {
        "metrics_aggregation_interval": 60,
        "metrics_collection_interval": 10,
        "service_address": ":8125"
      },
      "swap": {
        "measurement": [
          "swap_used_percent"
        ],
        "metrics_collection_interval": 60
      }
    }
  },
  "logs": {
    "metrics_collected": {
      "prometheus": {
        "cluster_name": "tableau-dp2",
        "log_group_name": "tableau-dp2",
        "prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus.yaml",
        "emf_processor": {
          "metric_declaration_dedup": true,
          "metric_namespace": "CWAgent/Prometheus",
          "metric_unit": {
            "java_lang_memory_heapmemoryusage_used": "Bytes"
          },
          "metric_declaration": [
            {
              "source_labels": ["node"],
              "label_matcher": "*",
              "dimensions": [
                [
                  "ClusterName",
                  "node",
                  "application",
                  "service",
                  "service_instance"
                ]
              ],
              "metric_selectors": [
                "^java_lang_memory_heapmemoryusage_used"
              ]
            }
          ]
        }
      }
    },
    "force_flush_interval": 5
  }
} 

prometheus.yaml

global:
  scrape_interval: 1m
  scrape_timeout: 10s
scrape_configs:
  - job_name: jmx
    sample_limit: 10000
    file_sd_configs:
      - files: ["/opt/aws/amazon-cloudwatch-agent/etc/prometheus_sd_jmx.yaml"]

prometheus_sd_jmx.yaml

- targets:
  - 127.0.0.1:12300
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "0"
    node: node1
- targets:
  - 127.0.0.1:12301
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "1"
    node: node1
- targets:
  - 127.0.0.1:12302
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "2"
    node: node1
- targets:
  - 127.0.0.1:12303
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "3"
    node: node1

Environment
OS: Ubuntu 18.04.6 LTS"

Additional context
Add any other context about the problem here.

Hi @ecerulm, thank you for providing all the details.
One more thing that would help is if you could curl the prometheus endpoint and provide us a static snapshot of the raw prometheus metrics from the target.

This issue was marked stale due to lack of activity.

Closing this because it has stalled. Feel free to reopen if this issue is still relevant, or to ping the collaborator who labeled it stalled if you have any questions.