CloudWatch metrics collected from prometheus , contains undesired dimensions
Closed this issue · 3 comments
Describe the bug
My configuration says
logs": {
"metrics_collected": {
"prometheus": {
"cluster_name": "tableau-dp2",
"log_group_name": "tableau-dp2",
"prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus.yaml",
"emf_processor": {
"metric_declaration_dedup": true,
"metric_namespace": "CWAgent/Prometheus",
"metric_unit": {
"java_lang_memory_heapmemoryusage_used": "Bytes"
},
"metric_declaration": [
{
"source_labels": ["node"],
"label_matcher": "*",
"dimensions": [
[
"ClusterName",
"node",
"application",
"service",
"service_instance"
]
],
"metric_selectors": [
"^java_lang_memory_heapmemoryusage_used"
]
}
]
}
}
},
which specifies that only the following labels should becom dimensions
- ClusterName
- node
- application
- service
- service_instance
but the final cloudwatch log event is
{
"CloudWatchMetrics": [
{
"Namespace": "CWAgent/Prometheus",
"Dimensions": [
[
"service",
"service_instance",
"ClusterName",
"host",
"job",
"prom_metric_type",
"instance",
"node",
"application"
]
],
"Metrics": [
{
"Name": "java_lang_memory_heapmemoryusage_used",
"Unit": "Bytes"
},
{
"Name": "jmx_scrape_cached_beans"
},
{
"Name": "jmx_scrape_duration_seconds"
},
{
"Name": "jmx_scrape_error"
}
]
}
],
"ClusterName": "tableau-dp2",
"Timestamp": "1717502587825",
"Version": "0",
"application": "Tableau",
"host": "xxxx",
"instance": "127.0.0.1:12302",
"job": "jmx",
"node": "node1",
"prom_metric_type": "gauge",
"service": "vizqlservice",
"service_instance": "2",
"java_lang_memory_heapmemoryusage_used": 506484968,
"jmx_scrape_cached_beans": 0,
"jmx_scrape_duration_seconds": 0.057368237,
"jmx_scrape_error": 0
}
as you can see the .CloudWatchMetrics.Dimensions
contain additional dimension to the ones I specified:
host
job
prom_metric_type
instance
Steps to reproduce
If possible, provide a recipe for reproducing the error.
What did you expect to see?
I expect to see only the dimensions that I specified, or at least have documented somewhere that what dimensions will be "forced" or automatically added
What did you see instead?
I saw the dimensions that I specified **plus 4 other dimensions that I didn't ask for **
What version did you use?
Version: CWAgent/1.300039.0b612 (go1.22.2; linux; amd64)
What config did you use?
config.json
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root",
"debug": true
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_iowait",
"cpu_usage_user",
"cpu_usage_system"
],
"metrics_collection_interval": 60,
"totalcpu": true
},
"disk": {
"measurement": [
"used_percent",
"inodes_free"
],
"metrics_collection_interval": 60,
"resources": [
"/"
]
},
"diskio": {
"measurement": [
"io_time",
"write_bytes",
"read_bytes",
"writes",
"reads"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"netstat": {
"measurement": [
"tcp_established",
"tcp_time_wait"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
},
"swap": {
"measurement": [
"swap_used_percent"
],
"metrics_collection_interval": 60
}
}
},
"logs": {
"metrics_collected": {
"prometheus": {
"cluster_name": "tableau-dp2",
"log_group_name": "tableau-dp2",
"prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus.yaml",
"emf_processor": {
"metric_declaration_dedup": true,
"metric_namespace": "CWAgent/Prometheus",
"metric_unit": {
"java_lang_memory_heapmemoryusage_used": "Bytes"
},
"metric_declaration": [
{
"source_labels": ["node"],
"label_matcher": "*",
"dimensions": [
[
"ClusterName",
"node",
"application",
"service",
"service_instance"
]
],
"metric_selectors": [
"^java_lang_memory_heapmemoryusage_used"
]
}
]
}
}
},
"force_flush_interval": 5
}
}
prometheus.yaml
global:
scrape_interval: 1m
scrape_timeout: 10s
scrape_configs:
- job_name: jmx
sample_limit: 10000
file_sd_configs:
- files: ["/opt/aws/amazon-cloudwatch-agent/etc/prometheus_sd_jmx.yaml"]
prometheus_sd_jmx.yaml
- targets:
- 127.0.0.1:12300
labels:
application: Tableau
service: vizqlservice
service_instance: "0"
node: node1
- targets:
- 127.0.0.1:12301
labels:
application: Tableau
service: vizqlservice
service_instance: "1"
node: node1
- targets:
- 127.0.0.1:12302
labels:
application: Tableau
service: vizqlservice
service_instance: "2"
node: node1
- targets:
- 127.0.0.1:12303
labels:
application: Tableau
service: vizqlservice
service_instance: "3"
node: node1
Environment
OS: Ubuntu 18.04.6 LTS"
Additional context
Add any other context about the problem here.
Hi @ecerulm, thank you for providing all the details.
One more thing that would help is if you could curl
the prometheus endpoint and provide us a static snapshot of the raw prometheus metrics from the target.
This issue was marked stale due to lack of activity.
Closing this because it has stalled. Feel free to reopen if this issue is still relevant, or to ping the collaborator who labeled it stalled if you have any questions.