Metric name conflict
Opened this issue · 2 comments
Hi,
I have an issue with the Azure exporter metric name. My problem is, I want to get, for example, the "cpu_percent" metric value for a DB SQL but also for a DB PostgreSQL. The metric name is the same for both services and this seems the root cause of the issue.
My configuration:
resource_groups:
- resource_group: "rg-test"
resource_types:
- Microsoft.Sql/servers/databases
metrics:
- name: "cpu_percent"
- resource_group: "rg-test"
resource_types:
- Microsoft.DBforPostgreSQL/servers
metrics:
- name: "cpu_percent"
With this configuration, I receive the error:
- collected metric cpu_percent_percent_max label:<name:"resource_group" value:"rg-test" > label:<name:"resource_name" value:"test-postgres" > gauge:<value:0 > was collected before with the same name and label values
But using my PR: #44 (which adds an additional label to avoid this error)
- collected metric storage_percent_percent_max label:<name:"resource_group" value:"rg-test" > label:<name:"resource_name" value:"test-postgres" > gauge:<value:22.86 > has label dimensions inconsistent with previously collected metrics in the same metric family
I've also tried with this configuration(because I was not sure what was the correct syntax):
resource_groups:
- resource_group: "rg-test"
resource_types:
- Microsoft.Sql/servers/databases
- Microsoft.Sql/servers/databases
metrics:
- name: "cpu_percent"
But same issue.
I'm able to get the metric if in my configuration I put only "Microsoft.Sql/servers/databases" or "Microsoft.Sql/servers/databases". I mean, individually, it works but not when both are set together.
The cpu metric is:
# HELP cpu_percent_percent_total cpu_percent_percent_total
# TYPE cpu_percent_percent_total gauge
cpu_percent_percent_total{resource_group="rg-test",resource_name="test-postgres"} 0
Regarding the way to build the metric name, why not doing the same than for the AWS Cloudwatch exporter? Each metric has the service name.
Example: for the RDS database metrics: aws_rds_database_connections_sum
In the case of Azure metric exporter, the name of the DB CPU Percent metric is: cpu_percent_percent_total
Resource Types could have one of this format:
- Microsoft.Web/sites
- Microsoft.StreamAnalytics/streamingjobs
- Microsoft.Sql/servers/databases
- Microsoft.TimeSeriesInsights/environments/eventsources
Why not definiting the metric name like this(in my example, the cpu percent metric exists for each resource type):
- azure_web_sites_cpu_percent_percent_total
- azure_stream_analytics_streamingjobs_cpu_percent_percent_total
- azure_sql_servers_databases_cpu_percent_percent_total
- azure_timeseriesinsights_environments_eventsources_cpu_percent_percent_total
"Azure" at the beginning is not required but at least we keep the same naming logic with the AWS exporter and it's easier to identify the metric origin when your Prometheus is gathering metrics from multiple providers.
So a regex removing "Microsoft", a replace of "/" by "_" and lowercase the string should be enough IMO.
@brian-brazil What do you think ?
Thanks
It looks like cpu_percent_percent_total has the same meaning across resources, so not putting the service name in there would make sense. Similarly to how it's always just process_cpu_seconds_total.
Although it does not directly solve the problem, but I found a suitable workaround for such kind of conflicts by relabling Prometheus metrics.
Imho it would be hard for maintainers to find a general and suitable for everyone approach to prefix all metrics