CloudWatch metrics collected from prometheus , contains undesired dimensions

Question

CloudWatch metrics collected from prometheus , contains undesired dimensions

Closed this issue 3 months ago · 3 comments

Describe the bug

My configuration says

logs": {
    "metrics_collected": {
      "prometheus": {
        "cluster_name": "tableau-dp2",
        "log_group_name": "tableau-dp2",
        "prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus.yaml",
        "emf_processor": {
          "metric_declaration_dedup": true,
          "metric_namespace": "CWAgent/Prometheus",
          "metric_unit": {
            "java_lang_memory_heapmemoryusage_used": "Bytes"
          },
          "metric_declaration": [
            {
              "source_labels": ["node"],
              "label_matcher": "*",
              "dimensions": [
                [
                  "ClusterName",
                  "node",
                  "application",
                  "service",
                  "service_instance"
                ]
              ],
              "metric_selectors": [
                "^java_lang_memory_heapmemoryusage_used"
              ]
            }
          ]
        }
      }
    },

which specifies that only the following labels should becom dimensions

ClusterName
node
application
service
service_instance

but the final cloudwatch log event is

{
    "CloudWatchMetrics": [
        {
            "Namespace": "CWAgent/Prometheus",
            "Dimensions": [
                [
                    "service",
                    "service_instance",
                    "ClusterName",
                    "host",
                    "job",
                    "prom_metric_type",
                    "instance",
                    "node",
                    "application"
                ]
            ],
            "Metrics": [
                {
                    "Name": "java_lang_memory_heapmemoryusage_used",
                    "Unit": "Bytes"
                },
                {
                    "Name": "jmx_scrape_cached_beans"
                },
                {
                    "Name": "jmx_scrape_duration_seconds"
                },
                {
                    "Name": "jmx_scrape_error"
                }
            ]
        }
    ],
    "ClusterName": "tableau-dp2",
    "Timestamp": "1717502587825",
    "Version": "0",
    "application": "Tableau",
    "host": "xxxx",
    "instance": "127.0.0.1:12302",
    "job": "jmx",
    "node": "node1",
    "prom_metric_type": "gauge",
    "service": "vizqlservice",
    "service_instance": "2",
    "java_lang_memory_heapmemoryusage_used": 506484968,
    "jmx_scrape_cached_beans": 0,
    "jmx_scrape_duration_seconds": 0.057368237,
    "jmx_scrape_error": 0
}

as you can see the .CloudWatchMetrics.Dimensions contain additional dimension to the ones I specified:

host
job
prom_metric_type
instance

Steps to reproduce
If possible, provide a recipe for reproducing the error.

What did you expect to see?

I expect to see only the dimensions that I specified, or at least have documented somewhere that what dimensions will be "forced" or automatically added

What did you see instead?

I saw the dimensions that I specified **plus 4 other dimensions that I didn't ask for **

What version did you use?
Version: CWAgent/1.300039.0b612 (go1.22.2; linux; amd64)

What config did you use?
config.json


{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "root",
    "debug": true
  },
  "metrics": {
    "aggregation_dimensions": [
      [
        "InstanceId"
      ]
    ],
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
      "ImageId": "${aws:ImageId}",
      "InstanceId": "${aws:InstanceId}",
      "InstanceType": "${aws:InstanceType}"
    },
    "metrics_collected": {
      "collectd": {
        "metrics_aggregation_interval": 60
      },
      "cpu": {
        "measurement": [
          "cpu_usage_idle",
          "cpu_usage_iowait",
          "cpu_usage_user",
          "cpu_usage_system"
        ],
        "metrics_collection_interval": 60,
        "totalcpu": true
      },
      "disk": {
        "measurement": [
          "used_percent",
          "inodes_free"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "/"
        ]
      },
      "diskio": {
        "measurement": [
          "io_time",
          "write_bytes",
          "read_bytes",
          "writes",
          "reads"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      },
      "mem": {
        "measurement": [
          "mem_used_percent"
        ],
        "metrics_collection_interval": 60
      },
      "netstat": {
        "measurement": [
          "tcp_established",
          "tcp_time_wait"
        ],
        "metrics_collection_interval": 60
      },
      "statsd": {
        "metrics_aggregation_interval": 60,
        "metrics_collection_interval": 10,
        "service_address": ":8125"
      },
      "swap": {
        "measurement": [
          "swap_used_percent"
        ],
        "metrics_collection_interval": 60
      }
    }
  },
  "logs": {
    "metrics_collected": {
      "prometheus": {
        "cluster_name": "tableau-dp2",
        "log_group_name": "tableau-dp2",
        "prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus.yaml",
        "emf_processor": {
          "metric_declaration_dedup": true,
          "metric_namespace": "CWAgent/Prometheus",
          "metric_unit": {
            "java_lang_memory_heapmemoryusage_used": "Bytes"
          },
          "metric_declaration": [
            {
              "source_labels": ["node"],
              "label_matcher": "*",
              "dimensions": [
                [
                  "ClusterName",
                  "node",
                  "application",
                  "service",
                  "service_instance"
                ]
              ],
              "metric_selectors": [
                "^java_lang_memory_heapmemoryusage_used"
              ]
            }
          ]
        }
      }
    },
    "force_flush_interval": 5
  }
}

prometheus.yaml

global:
  scrape_interval: 1m
  scrape_timeout: 10s
scrape_configs:
  - job_name: jmx
    sample_limit: 10000
    file_sd_configs:
      - files: ["/opt/aws/amazon-cloudwatch-agent/etc/prometheus_sd_jmx.yaml"]

prometheus_sd_jmx.yaml

- targets:
  - 127.0.0.1:12300
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "0"
    node: node1
- targets:
  - 127.0.0.1:12301
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "1"
    node: node1
- targets:
  - 127.0.0.1:12302
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "2"
    node: node1
- targets:
  - 127.0.0.1:12303
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "3"
    node: node1

Environment
OS: Ubuntu 18.04.6 LTS"

Additional context
Add any other context about the problem here.

Answer 1 · 2024-06-13T14:23:47.000Z

Hi @ecerulm, thank you for providing all the details.
One more thing that would help is if you could curl the prometheus endpoint and provide us a static snapshot of the raw prometheus metrics from the target.

Answer 2 · 2024-09-15T00:12:07.000Z

This issue was marked stale due to lack of activity.

Answer 3 · 2024-10-24T00:11:38.000Z

Closing this because it has stalled. Feel free to reopen if this issue is still relevant, or to ping the collaborator who labeled it stalled if you have any questions.