jenningsloy318/redfish_exporter

Occure Unexpected Panic Error( HPE Server)

Opened this issue · 1 comments

While exceuting redfish_exporter, there are panic error in over 10,000 HPE nodes in my infra.

  • exporter log
2023/01/02 12:50:59  info scraping target host      app=redfish_exporter target=192.168.22.175
2023/01/02 12:50:59  info no PCI-E device data found System=1 app=redfish_exporter collector=SystemCollector operation=system.PCIeDevices() target=192.230.169.59
2023/01/02 12:50:59  info collector scrape completed Chassis=1 app=redfish_exporter collector=ChassisCollector target=192.230.164.41
2023/01/02 12:50:59  info scraping target host      app=redfish_exporter target=192.230.123.45
2023/01/02 12:50:59  info collector scrape completed Manager=1 app=redfish_exporter collector=ManagerCollector target=192.230.178.144
panic: send on closed channel

goroutine 351152 [running]:
github.com/jenningsloy318/redfish_exporter/collector.parseEthernetInterface(0xc016200840, 0xc01bc7e878, 0x8, 0xc01ca5f600, 0xc0463ca040)
	/go/src/github.com/jenningsloy318/redfish_exporter/collector/system_collector.go:684 +0x465
created by github.com/jenningsloy318/redfish_exporter/collector.(*SystemCollector).Collect
	/go/src/github.com/jenningsloy318/redfish_exporter/collector/system_collector.go:532 +0xbdf
  • redfish_exporter.yml
hosts:
  0.0.0.0:
    username:admin
    username:admin
...
groups:
  redfish_hpe:
    username:admin
    username:admin
  • prometheus.yml
 # my global config
global:
  scrape_interval: 60s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 60s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).


# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
           - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - 'rules.yml'
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: grafana
    static_configs:
      - targets: ["localhost:3000"]

  - job_name: node
    static_configs:
      - targets: ["localhost:9100"]

  - job_name: redfish_exporter
    static_configs:
      - targets: ["localhost:9610"]


  - job_name: "redfish_hpe"
    scrape_interval: 5m
    scrape_timeout: 2m
    file_sd_configs:
      - files :
        - ./target/hpe.json
    metrics_path: /redfish
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: {my_ipv4}:9610
      - target_label: __param_group
        replacement: redfish_hpe

Are there any ideas solve this problem?
I have no idea to debug this problem because of sudden panic without specific log.

Additionally, Do you have any data about limit scales by this exporter?

Thank you!

I'm hitting very similar issue:

2023/02/09 14:51:33  info app started. listening on :9610 app=redfish_exporter
level=info msg="TLS is disabled." http2=false
2023/02/09 15:02:30  info no network interface data found System=437XR1138R2 app=redfish_exporter collector=SystemCollector operation=system.NetworkInterfaces() target=10.128.0.7:8000
2023/02/09 15:02:31  info no PCI-E device function data found System=437XR1138R2 app=redfish_exporter collector=SystemCollector operation=system.PCIeFunctions() target=10.128.0.7:8000
2023/02/09 15:02:31  info collector scrape completed System=437XR1138R2 app=redfish_exporter collector=SystemCollector target=10.128.0.7:8000
panic: send on closed channel

goroutine 371 [running]:
github.com/jenningsloy318/redfish_exporter/collector.parseDevice(0xc000352678?, {0xc0000b61ba, 0x6}, {{0xc000282010, 0xa}, 0x0, {0x0, 0x0}, {0x0, 0x0}, ...}, ...)
        /go/src/collector/system_collector.go:675 +0x217
created by github.com/jenningsloy318/redfish_exporter/collector.(*SystemCollector).Collect
        /go/src/collector/system_collector.go:583 +0x1fed

Any hints how to debug further?

Thanks!