collected metric "redfish_system_pcie_function_state" ... was collected before with the same name and label values on PERC H730 Mini
Opened this issue · 10 comments
We have a few older Dell systems that have a PERC H730 Mini integrated RAID controller. On these systems, redfish_exporter (latest git: e28371d) throws a fatal error, while it used to work ok prior to the collection of more detailed PCIe metrics:
An error has occurred while serving metrics:
2 error(s) occurred:
* [from Gatherer #2] collected metric "redfish_system_pcie_function_state" { label:<name:"hostname" value:"" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H730 Mini" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values
* [from Gatherer #2] collected metric "redfish_system_pcie_function_health_state" { label:<name:"hostname" value:"" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H730 Mini" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values
I think perhaps these adapters don't report a "state" as the exporter expects it to do, this is the data from /redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1
:
{
"@odata.context": "/redfish/v1/$metadata#Storage.Storage",
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1",
"@odata.type": "#Storage.v1_4_0.Storage",
"Description": "PERC H730 Mini",
"Drives": [
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.0:Enclosure.Internal.0-1:RAID.Integrated.1-1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.1:Enclosure.Internal.0-1:RAID.Integrated.1-1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.2:Enclosure.Internal.0-1:RAID.Integrated.1-1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.3:Enclosure.Internal.0-1:RAID.Integrated.1-1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.4:Enclosure.Internal.0-1:RAID.Integrated.1-1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.5:Enclosure.Internal.0-1:RAID.Integrated.1-1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.6:Enclosure.Internal.0-1:RAID.Integrated.1-1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.7:Enclosure.Internal.0-1:RAID.Integrated.1-1"
}
],
"Drives@odata.count": 8,
"Id": "RAID.Integrated.1-1",
"Links": {
"Enclosures": [
{
"@odata.id": "/redfish/v1/Chassis/Enclosure.Internal.0-1:RAID.Integrated.1-1"
},
{
"@odata.id": "/redfish/v1/Chassis/System.Embedded.1"
}
],
"Enclosures@odata.count": 2
},
"Name": "PERC H730 Mini",
"Status": {
"Health": "OK",
"HealthRollup": "OK",
"State": "Enabled"
},
"StorageControllers": [
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/StorageControllers/RAID.Integrated.1-1",
"Assembly": {
"@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Assembly"
},
"FirmwareVersion": "25.5.6.0009",
"Identifiers": [
{
"DurableName": "544A842006943000",
"DurableNameFormat": "NAA"
}
],
"Links": {},
"Manufacturer": "DELL",
"MemberId": "RAID.Integrated.1-1",
"Model": "PERC H730 Mini",
"Name": "PERC H730 Mini",
"SpeedGbps": 12,
"Status": {
"Health": "OK",
"HealthRollup": "OK",
"State": "Enabled"
},
"SupportedControllerProtocols": [
"PCIe"
],
"SupportedDeviceProtocols": [
"SAS",
"SATA"
]
}
],
"StorageControllers@odata.count": 1,
"Volumes": {
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1/Volumes"
}
}
Hi,
your error shows that it occured when scraping pcie_function, but you don't post it. you just post the storage/RAID output, can you please confirm.
ok, so /redfish/v1/Systems/System.Embedded.1
has a few PCIeFunctions
:
"PCIeFunctions": [
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/130-0-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/130-0-1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/9-0-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-23-4"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-29-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-31-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/2-0-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-2"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-3"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-26-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-49-2"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-3-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-7"
}
],
"PCIeFunctions@odata.count": 17,
and I read from the error message that it is 0-0-0
which we're interested in, so this is /redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0
:
{
"@odata.context": "/redfish/v1/$metadata#PCIeFunction.PCIeFunction",
"@odata.etag": "1693376981",
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0",
"@odata.type": "#PCIeFunction.v1_1_1.PCIeFunction",
"ClassCode": "0x000006",
"Description": "Xeon E7 v3/Xeon E5 v3/Core i7 DMI2",
"DeviceClass": "Bridge",
"DeviceId": "0x2f00",
"FunctionId": 0,
"FunctionType": "Physical",
"Id": "0-0-0",
"Links": {
"Drives": [],
"Drives@odata.count": 0,
"EthernetInterfaces": [],
"EthernetInterfaces@odata.count": 0,
"PCIeDevice": {
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevice/0-0"
},
"StorageControllers": [],
"StorageControllers@odata.count": 0
},
"Name": "Xeon E7 v3/Xeon E5 v3/Core i7 DMI2",
"RevisionId": "0x02",
"Status": {
"Health": "OK",
"HealthRollup": "OK",
"State": "Enabled"
},
"SubsystemId": "0x0000",
"SubsystemVendorId": "0x8086",
"VendorId": "0x8086"
}
Is that helpful? I'm happy to post more, please explain in detail what you might need
not exactly, you error message
* [from Gatherer #2] collected metric "redfish_system_pcie_function_state" { label:<name:"hostname" value:"" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H730 Mini" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values
* [from Gatherer #2] collected metric "redfish_system_pcie_function_health_state" { label:<name:"hostname" value:"" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H730 Mini" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values
which means that there must be some extra attribute to distinguish these metrics, so please help upload all api responses that match the errors exactly.
from you single pciefunction response, I can't differentiate which label I can add for it .
ok, so three weeks ago I was confused, because what I was seeing didn't match my memories and I had a hard time reproducing the original issue. Today I took some more time and a systematic approach, and I am now certain that some servers which displayed this issue no longer do. On those servers, we have done firmware updates, among other things updating the "PowerEdge Server BIOS" from version 2.15 to 2.17.
On several boxes that still have a 2.15 or 2.13 BIOS and display the error, the output of /redfish/v1/Systems/System.Embedded.1
actually looks different to what I wrote three weeks ago: As you can see below, the PCIeFunction/0-0-0
is listed twice, and I guess that's the reason the exporter is scraping it twice, and unsurprisingly finds the same data twice.
Given that this is fixed in current firmware versions, I'm not sure if you want to change the exporter to guard against duplicate IDs, or just write it off as Dell's problem and close this issue?
$ curl https://..../redfish/v1/Systems/System.Embedded.1' | jq
...
"PCIeFunctions": [
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/10-0-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0" <==
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-23-4"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-29-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-31-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0" <==
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-2"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-3"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-26-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-49-2"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-2-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-3-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-0"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-7"
}
],
"PCIeFunctions@odata.count": 16,
Browser to access http://172.100.70.202:9610/redfish? target=172.100.70.52 The result is:
`An error has occurred while serving metrics:
8 error(s) occurred:
- [from Gatherer #2] collected metric "redfish_system_pcie_device_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller" > label:<name:"pcie_device_id" value:"177-0" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
- [from Gatherer #2] collected metric "redfish_system_pcie_device_health_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller" > label:<name:"pcie_device_id" value:"177-0" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
- [from Gatherer #2] collected metric "redfish_system_pcie_device_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"C620 Series Chipset Family SMBus" > label:<name:"pcie_device_id" value:"0-31" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
- [from Gatherer #2] collected metric "redfish_system_pcie_device_health_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"C620 Series Chipset Family SMBus" > label:<name:"pcie_device_id" value:"0-31" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
- [from Gatherer #2] collected metric "redfish_system_pcie_device_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"C620 Series Chipset Family PCI Express Root Port #5" > label:<name:"pcie_device_id" value:"0-28" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
- [from Gatherer #2] collected metric "redfish_system_pcie_device_health_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"C620 Series Chipset Family PCI Express Root Port #5" > label:<name:"pcie_device_id" value:"0-28" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
- [from Gatherer #2] collected metric "redfish_system_pcie_device_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"PowerEdge Rx5xx LOM Board" > label:<name:"pcie_device_id" value:"4-0" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
- [from Gatherer #2] collected metric "redfish_system_pcie_device_health_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"PowerEdge Rx5xx LOM Board" > label:<name:"pcie_device_id" value:"4-0" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values`
==========================================================================
I use the postman test request/redfish/v1 / Systems/System. Embedded. 1 / result is:
"PCIeDevices": [ { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/177-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/177-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-31" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-23" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-28" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/4-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/202-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/49-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/3-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-17" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-31" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-28" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/4-0" } ], "PCIeDevices@odata.count": 13,
From the returned results can be found in the same @ odata. Id such as: "@ odata. Id" : "/ redfish/v1 / Systems/System. Embedded. 1 / PCIeDevices / 177-0"
===================================================================
我的服务器信息是dell PowerEdge R750 iDRAC9
@hanchao131415 what is your BiosVersion
value from /redfish/v1/Systems/System.Embedded.1
? If it is less than 2.17.0, does the issue persist when you upgrade to the current server firmware?
"AssetTag": "", "Bios": { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Bios" }, "BiosVersion": "1.8.2",
==================================
My bios version is 1.8.2 and I have not upgraded the bios version
"AssetTag":"","Bios":{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/Bios"},"BiosVersion":"2.9.0",
I see the issue here despite a bios of 2.9.0.
2 error(s) occurred:
* [from Gatherer #2] collected metric "redfish_system_pcie_function_state" { label:<name:"hostname" value:"--removed--" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H710P Mini (for monolithics)" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values
* [from Gatherer #2] collected metric "redfish_system_pcie_function_health_state" { label:<name:"hostname" value:"--removed--" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H710P Mini (for monolithics)" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values
In my case it's possible that some examples (such as this one) have IDRAC7 (which still supports Redfish API).
edit: confirmed on a 2.18.1 BIOS for IDRAC8
However the pcie_function 0-0-0 still appears twice despite the bios version:
https://removed/redfish/v1/Systems/System.Embedded.1
"PCIeFunctions":[{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/6-0-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/2-0-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-29-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-31-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-31-2"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-1-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-4"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/8-0-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/8-0-1"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/8-0-2"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/8-0-3"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/2-0-1"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-26-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-3-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-7"}], "PCIeFunctions@odata.count":18,
https://removed/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0
{"@odata.context":"/redfish/v1/$metadata#PCIeFunction.PCIeFunction","@odata.etag":"1705552257","@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0","@odata.type":"#PCIeFunction.v1_1_1.PCIeFunction","ClassCode":"0x000000","Description":"PERC H830 Adapter","DeviceClass":"UnclassifiedDevice","DeviceId":"0x005d","FunctionId":0,"FunctionType":"Physical","Id":"0-0-0","Links":{"Drives":[],"Drives@odata.count":0,"EthernetInterfaces":[],"EthernetInterfaces@odata.count":0,"PCIeDevice":{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeDevice/0-0"},"StorageControllers":[],"StorageControllers@odata.count":0},"Name":"PERC H830 Adapter","RevisionId":"0x00","Status":{"Health":"OK","HealthRollup":"OK","State":"Enabled"},"SubsystemId":"0x1f41","SubsystemVendorId":"0x1028","VendorId":"0x1000"}
Hi,
I submitted PR which workarounds this problem. Any feedback is welcomed.