mrlhansen/idrac_exporter

Failed to collect metrics with storage collector

Closed this issue · 4 comments

Hello!
I tried to use idrac_exporter with Lenovo xClarity servers. I don't get any mertrics if enable collector storage:

│ 2023-12-18T08:54:10.121 ERROR Error collecting metrics for host <myserver>: 3 error(s) occurred:                                                                                                               │
│ * collected metric "idrac_drive_info" { label:<name:"id" value:"Disk.0" > label:<name:"manufacturer" value:"Intel" > label:<name:"mediatype" value:"SSD" > label:<name:"model" value:"SSDSC2KB076TZL" > label: │
│ * collected metric "idrac_drive_health" { label:<name:"id" value:"Disk.0" > label:<name:"status" value:"OK" > gauge:<value:0 > } was collected before with the same name and label values                      │
│ * collected metric "idrac_drive_capacity_bytes" { label:<name:"id" value:"Disk.0" > gauge:<value:7.681501126144e+12 > } was collected before with the same name and label values                               │
│ 2023-12-18T08:55:05.598 ERROR Error collecting metrics for host <myserver>: 3 error(s) occurred:                                                                                                               │
│ * collected metric "idrac_drive_info" { label:<name:"id" value:"Disk.1" > label:<name:"manufacturer" value:"Intel" > label:<name:"mediatype" value:"SSD" > label:<name:"model" value:"SSDSC2KB076TZL" > label: │
│ * collected metric "idrac_drive_health" { label:<name:"id" value:"Disk.1" > label:<name:"status" value:"OK" > gauge:<value:0 > } was collected before with the same name and label values                      │
│ * collected metric "idrac_drive_capacity_bytes" { label:<name:"id" value:"Disk.1" > gauge:<value:7.681501126144e+12 > } was collected before with the same name and label values                               │
│ 2023-12-18T08:56:08.979 ERROR Error collecting metrics for host <myserver>: 3 error(s) occurred:                                                                                                               │
│ * collected metric "idrac_drive_info" { label:<name:"id" value:"Disk.1" > label:<name:"manufacturer" value:"Intel" > label:<name:"mediatype" value:"SSD" > label:<name:"model" value:"SSDSC2KB076TZL" > label: │
│ * collected metric "idrac_drive_health" { label:<name:"id" value:"Disk.1" > label:<name:"status" value:"OK" > gauge:<value:0 > } was collected before with the same name and label values                      │
│ * collected metric "idrac_drive_capacity_bytes" { label:<name:"id" value:"Disk.1" > gauge:<value:7.681501126144e+12 > } was collected before with the same name and label values                               │
│ 2023-12-18T08:57:04.272 ERROR Error collecting metrics for host <myserver>: 3 error(s) occurred:                                                                                                               │
│ * collected metric "idrac_drive_info" { label:<name:"id" value:"Disk.0" > label:<name:"manufacturer" value:"Intel" > label:<name:"mediatype" value:"SSD" > label:<name:"model" value:"SSDSC2KB076TZL" > label: │
│ * collected metric "idrac_drive_health" { label:<name:"id" value:"Disk.0" > label:<name:"status" value:"OK" > gauge:<value:0 > } was collected before with the same name and label values                      │
│ * collected metric "idrac_drive_capacity_bytes" { label:<name:"id" value:"Disk.0" > gauge:<value:7.681501126144e+12 > } was collected before with the same name and label values                               │
│ 2023-12-18T08:58:04.731 ERROR Error collecting metrics for host <myserver>: 3 error(s) occurred:                                                                                                               │
│ * collected metric "idrac_drive_info" { label:<name:"id" value:"Disk.0" > label:<name:"manufacturer" value:"Intel" > label:<name:"mediatype" value:"SSD" > label:<name:"model" value:"SSDSC2KB076TZL" > label: │
│ * collected metric "idrac_drive_health" { label:<name:"id" value:"Disk.0" > label:<name:"status" value:"OK" > gauge:<value:0 > } was collected before with the same name and label values                      │
│ * collected metric "idrac_drive_capacity_bytes" { label:<name:"id" value:"Disk.0" > gauge:<value:7.681501126144e+12 > } was collected before with the same name and label values                               │
│ 2023-12-18T09:02:03.179 ERROR Error collecting metrics for host <myserver>: 3 error(s) occurred:                                                                                                               │
│ * collected metric "idrac_drive_info" { label:<name:"id" value:"Disk.1" > label:<name:"manufacturer" value:"Intel" > label:<name:"mediatype" value:"SSD" > label:<name:"model" value:"SSDSC2KB076TZL" > label: │
│ * collected metric "idrac_drive_health" { label:<name:"id" value:"Disk.1" > label:<name:"status" value:"OK" > gauge:<value:0 > } was collected before with the same name and label values                      │
│ * collected metric "idrac_drive_capacity_bytes" { label:<name:"id" value:"Disk.1" > gauge:<value:7.681501126144e+12 > } was collected before with the same name and label values    

If I turn off this storage collector, then I can get the rest of metrics.

I don't see why metrics are considered duplicates as disks have similar model but different names (Disk.1 and Disk.0)?

Unfortunately, I am not able to reproduce this on any of my Lenovo machines. If you start the exporter with the -verbose flag, what URLs are being queried when you scrape the machine? For example, on one of my Lenovo machines I get for the storage:

2023-12-18T15:32:47.285 DEBUG Querying url "https://<ip>/redfish/v1/Systems/1/Storage"
2023-12-18T15:32:47.683 DEBUG Querying url "https://<ip>/redfish/v1/Systems/1/Storage/Direct_Attached_SATA"
2023-12-18T15:32:48.051 DEBUG Querying url "https://<ip>/redfish/v1/Systems/1/Storage/Direct_Attached_SATA/Drives/Drive.Bay_0"
2023-12-18T15:32:48.324 DEBUG Querying url "https://<ip>/redfish/v1/Systems/1/Storage/Direct_Attached_SATA/Drives/Drive.Bay_1"

Here is what I get with -verbose:

2023-12-19T10:12:54+03:00	2023-12-19T07:12:54.077 DEBUG Metrics for host <myserver> collected
2023-12-19T10:12:53+03:00	2023-12-19T07:12:53.895 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/32"
2023-12-19T10:12:53+03:00	2023-12-19T07:12:53.715 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/31"
2023-12-19T10:12:53+03:00	2023-12-19T07:12:53.419 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/30"
2023-12-19T10:12:53+03:00	2023-12-19T07:12:53.139 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/29"
2023-12-19T10:12:52+03:00	2023-12-19T07:12:52.856 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/28"
2023-12-19T10:12:52+03:00	2023-12-19T07:12:52.569 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/27"
2023-12-19T10:12:52+03:00	2023-12-19T07:12:52.362 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/26"
2023-12-19T10:12:52+03:00	2023-12-19T07:12:52.121 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/25"
2023-12-19T10:12:51+03:00	2023-12-19T07:12:51.785 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/24"
2023-12-19T10:12:51+03:00	2023-12-19T07:12:51.494 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/23"
2023-12-19T10:12:51+03:00	2023-12-19T07:12:51.216 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/22"
2023-12-19T10:12:50+03:00	2023-12-19T07:12:50.970 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/21"
2023-12-19T10:12:50+03:00	2023-12-19T07:12:50.741 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/20"
2023-12-19T10:12:50+03:00	2023-12-19T07:12:50.555 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/19"
2023-12-19T10:12:50+03:00	2023-12-19T07:12:50.377 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/18"
2023-12-19T10:12:50+03:00	2023-12-19T07:12:50.198 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/17"
2023-12-19T10:12:50+03:00	2023-12-19T07:12:50.014 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/16"
2023-12-19T10:12:49+03:00	2023-12-19T07:12:49.802 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/15"
2023-12-19T10:12:49+03:00	2023-12-19T07:12:49.566 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/14"
2023-12-19T10:12:49+03:00	2023-12-19T07:12:49.315 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/13"
2023-12-19T10:12:49+03:00	2023-12-19T07:12:49.135 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/12"
2023-12-19T10:12:48+03:00	2023-12-19T07:12:48.928 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/11"
2023-12-19T10:12:48+03:00	2023-12-19T07:12:48.746 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/10"
2023-12-19T10:12:48+03:00	2023-12-19T07:12:48.537 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/9"
2023-12-19T10:12:48+03:00	2023-12-19T07:12:48.338 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/8"
2023-12-19T10:12:48+03:00	2023-12-19T07:12:48.091 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/7"
2023-12-19T10:12:47+03:00	2023-12-19T07:12:47.853 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/6"
2023-12-19T10:12:47+03:00	2023-12-19T07:12:47.626 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/5"
2023-12-19T10:12:47+03:00	2023-12-19T07:12:47.416 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/4"
2023-12-19T10:12:47+03:00	2023-12-19T07:12:47.234 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/3"
2023-12-19T10:12:47+03:00	2023-12-19T07:12:47.028 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/2"
2023-12-19T10:12:46+03:00	2023-12-19T07:12:46.819 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory/1"
2023-12-19T10:12:46+03:00	2023-12-19T07:12:46.627 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Memory"
2023-12-19T10:12:46+03:00	2023-12-19T07:12:46.374 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Storage/M.2/Drives/Drive.M.2_Bay_1"
2023-12-19T10:12:46+03:00	2023-12-19T07:12:46.117 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Storage/M.2/Drives/Drive.M.2_Bay_0"
2023-12-19T10:12:45+03:00	2023-12-19T07:12:45.169 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Storage/M.2"
2023-12-19T10:12:44+03:00	2023-12-19T07:12:44.247 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Storage/HBA_Slot5"
2023-12-19T10:12:43+03:00	2023-12-19T07:12:43.169 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Storage/HBA_Slot3"
2023-12-19T10:12:42+03:00	2023-12-19T07:12:42.855 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Storage/HBA_Slot2/Drives/Disk.3"
2023-12-19T10:12:42+03:00	2023-12-19T07:12:42.488 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Storage/HBA_Slot2/Drives/Disk.2"
2023-12-19T10:12:42+03:00	2023-12-19T07:12:42.152 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Storage/HBA_Slot2/Drives/Disk.1"
2023-12-19T10:12:41+03:00	2023-12-19T07:12:41.751 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Storage/HBA_Slot2/Drives/Disk.0"
2023-12-19T10:12:40+03:00	2023-12-19T07:12:40.616 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Storage/HBA_Slot2"
2023-12-19T10:12:39+03:00	2023-12-19T07:12:39.958 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1/Storage"
2023-12-19T10:12:39+03:00	2023-12-19T07:12:39.958 DEBUG Query to url "https://<myserver>/redfish/v1/Managers/iDRAC.Embedded.1/Logs/Sel" returned unexpected status code: 404 (404 Not Found)
2023-12-19T10:12:39+03:00	2023-12-19T07:12:39.856 DEBUG Querying url "https://<myserver>/redfish/v1/Managers/iDRAC.Embedded.1/Logs/Sel"
2023-12-19T10:12:38+03:00	2023-12-19T07:12:38.846 DEBUG Querying url "https://<myserver>/redfish/v1/Chassis/1/Power"
2023-12-19T10:12:37+03:00	2023-12-19T07:12:37.114 DEBUG Querying url "https://<myserver>/redfish/v1/Chassis/1/Thermal"
2023-12-19T10:12:36+03:00	2023-12-19T07:12:36.252 DEBUG Querying url "https://<myserver>/redfish/v1/Systems/1"
2023-12-19T10:12:36+03:00	2023-12-19T07:12:36.252 DEBUG Collecting metrics for host <myserver>
2023-12-19T10:12:36+03:00	2023-12-19T07:12:36.252 DEBUG Handling request from 172.16.114.251:9348 for host <myserver>

I am not exactly sure why this is happening, I see nothing wrong here (but maybe the disks are named differently somewhere in the json that is fetched from XClarity).. Can I ask what is the model of the machine (e.g. SR645)?

One more question. You write:

If I turn off this storage collector, then I can get the rest of metrics.

You should get the rest of the metrics even if one of them fails. Are you using the latest version?