Occure Unexpected Panic Error( HPE Server)
Opened this issue · 1 comments
sykim1009 commented
While exceuting redfish_exporter, there are panic error in over 10,000 HPE nodes in my infra.
- exporter log
2023/01/02 12:50:59 info scraping target host app=redfish_exporter target=192.168.22.175
2023/01/02 12:50:59 info no PCI-E device data found System=1 app=redfish_exporter collector=SystemCollector operation=system.PCIeDevices() target=192.230.169.59
2023/01/02 12:50:59 info collector scrape completed Chassis=1 app=redfish_exporter collector=ChassisCollector target=192.230.164.41
2023/01/02 12:50:59 info scraping target host app=redfish_exporter target=192.230.123.45
2023/01/02 12:50:59 info collector scrape completed Manager=1 app=redfish_exporter collector=ManagerCollector target=192.230.178.144
panic: send on closed channel
goroutine 351152 [running]:
github.com/jenningsloy318/redfish_exporter/collector.parseEthernetInterface(0xc016200840, 0xc01bc7e878, 0x8, 0xc01ca5f600, 0xc0463ca040)
/go/src/github.com/jenningsloy318/redfish_exporter/collector/system_collector.go:684 +0x465
created by github.com/jenningsloy318/redfish_exporter/collector.(*SystemCollector).Collect
/go/src/github.com/jenningsloy318/redfish_exporter/collector/system_collector.go:532 +0xbdf
- redfish_exporter.yml
hosts:
0.0.0.0:
username:admin
username:admin
...
groups:
redfish_hpe:
username:admin
username:admin
- prometheus.yml
# my global config
global:
scrape_interval: 60s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 60s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- 'rules.yml'
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: grafana
static_configs:
- targets: ["localhost:3000"]
- job_name: node
static_configs:
- targets: ["localhost:9100"]
- job_name: redfish_exporter
static_configs:
- targets: ["localhost:9610"]
- job_name: "redfish_hpe"
scrape_interval: 5m
scrape_timeout: 2m
file_sd_configs:
- files :
- ./target/hpe.json
metrics_path: /redfish
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: {my_ipv4}:9610
- target_label: __param_group
replacement: redfish_hpe
Are there any ideas solve this problem?
I have no idea to debug this problem because of sudden panic without specific log.
Additionally, Do you have any data about limit scales by this exporter?
Thank you!
jakubmikusek commented
I'm hitting very similar issue:
2023/02/09 14:51:33 info app started. listening on :9610 app=redfish_exporter
level=info msg="TLS is disabled." http2=false
2023/02/09 15:02:30 info no network interface data found System=437XR1138R2 app=redfish_exporter collector=SystemCollector operation=system.NetworkInterfaces() target=10.128.0.7:8000
2023/02/09 15:02:31 info no PCI-E device function data found System=437XR1138R2 app=redfish_exporter collector=SystemCollector operation=system.PCIeFunctions() target=10.128.0.7:8000
2023/02/09 15:02:31 info collector scrape completed System=437XR1138R2 app=redfish_exporter collector=SystemCollector target=10.128.0.7:8000
panic: send on closed channel
goroutine 371 [running]:
github.com/jenningsloy318/redfish_exporter/collector.parseDevice(0xc000352678?, {0xc0000b61ba, 0x6}, {{0xc000282010, 0xa}, 0x0, {0x0, 0x0}, {0x0, 0x0}, ...}, ...)
/go/src/collector/system_collector.go:675 +0x217
created by github.com/jenningsloy318/redfish_exporter/collector.(*SystemCollector).Collect
/go/src/collector/system_collector.go:583 +0x1fed
Any hints how to debug further?
Thanks!