prometheus-community/ipmi_exporter

Problems with synchronisation between ipmi_remote.yml and config.yml

Hydrapozza opened this issue · 6 comments

Hello, I want to use the ipmi exporter as a container so here are my configs :

ipmi_remote.yml

modules:
        default:
                # These settings are used if no module is specified, the
                # specified module doesn't exist, or of course if
                # module=default is specified.
                user: "root"
                pass: "myPass"
                # The below settings correspond to driver-type, privilege-level, and
                # session-timeout respectively, see `man 5 freeipmi.conf` (and e.g.
                # `man 8 ipmi-sensors` for a list of driver types).
                driver: "LAN_2_0"
                privilege: "admin"
                # The session timeout is in milliseconds. Note that a scrape can take up
                # to (session-timeout * #-of-collectors) milliseconds, so set the scrape
                # timeout in Prometheus accordingly.
                # Must be larger than the retransmission timeout, which defaults to 1000.
                timeout: 10000
                # Available collectors are bmc, ipmi, chassis, dcmi, sel, and sm-lan-mode
                # If _not_ specified, bmc, ipmi, chassis, and dcmi are used
                collectors:
                #- bmc
                #- ipmi
                #- chassis
                # Got any sensors you don't care about? Add them here.
                exclude_sensor_ids:
                #- 2
                collector_cmd:
                        ipmi: sudo
                        bmc: sudo
                        chassis: sudo
                custom_args:
                        ipmi:
                        - "ipmimonitoring"
                        bmc:
                        - "ipmi-bmc"
                        chassis:
                        - "ipmi-chassis"

prometheus.yml

- job_name: 'ipmi'
    params:
      module: ['default']
    scrape_interval: 1m
    scrape_timeout: 30s
    metrics_path: /metrics
    scheme: http
    file_sd_configs:
      - files:
        - '/prometheus/ARC0_targets.yml'

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: ipmi-exporter:9290

docker-compose.yml

ipmi-exporter:
    build:
      context: .
    image: ipmi_exporter2.0
    networks:
      - monitoring
    volumes:
      - /home/osadmin/monitaur/ipmi_exporter/ipmi_remote.yml:/config.yml:ro
        #- /home/osadmin/monitaur/refish_exporter/redfish_exporter.yml:/etc/prometheus/redfish_exporter.yml
    ports:
      - "9290:9290"

ARC0_targets.yml

- targets: [ '10.104.86.45' ]
  labels:
    hostname: ARC0CPU005
    pop: ARC0
    role: Compute
- targets: [ '10.104.86.46' ]
  labels:
    hostname: ARC0CPU006
    pop: ARC0
    role: Compute

Now the problem is about the prometheus.yml configuration. When i use /metrics, prometheus isn't returning any error but the exporter returns :

monitaur-ipmi-exporter-1     | ts=2023-03-13T14:17:02.281Z caller=collector_ipmi.go:151 level=error msg="Failed to collect sensor data" target=[local] error="error running ipmimonitoring: exit status 1: could not find inband device\n"
monitaur-ipmi-exporter-1     | ts=2023-03-13T14:17:02.284Z caller=collector_dcmi.go:53 level=error msg="Failed to collect DCMI data" target=[local] error="error running ipmi-dcmi: exit status 1: could not find inband device\n"
monitaur-ipmi-exporter-1     | ts=2023-03-13T14:17:02.287Z caller=collector_bmc.go:53 level=error msg="Failed to collect BMC data" target=[local] error="error running bmc-info: exit status 1: could not find inband device\n"
monitaur-ipmi-exporter-1     | ts=2023-03-13T14:17:02.289Z caller=collector_chassis.go:53 level=error msg="Failed to collect chassis data" target=[local] error="error running ipmi-chassis: exit status 1: could not find inband device\n"

First problem i've noticed is "target=[local]" which means the exporter isn't using the target list I gave him (I guess). But also the error "could not find inband device" is weird because from the ipmi-exporter container, I'm completly able to use a typical command like "ipmi-chassis -D lanplus -h 10.104.86.45 -u root -p 'myPass' --get-chassis-status.
(If i'm correct LAN_2_0 is equivalent to lanplus)

Then I tryed to change /metrics to /ipmi but i'm getting an error 400 from prometheus when scraping and the exporter isn't returning anything except the default launching logs:

monitaur-ipmi-exporter-1     | ts=2023-03-13T14:34:04.483Z caller=main.go:103 level=info msg="Starting ipmi_exporter" version="(version=1.6.1, branch=master, revision=8fdc078f6c7ccd4ce443e8e5711d34149c81f3fe)"
monitaur-ipmi-exporter-1     | ts=2023-03-13T14:34:04.483Z caller=tls_config.go:232 level=info msg="Listening on" address=[::]:9290
monitaur-ipmi-exporter-1     | ts=2023-03-13T14:34:04.483Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=[::]:9290

When I'm executing the ipmi container, I'm also able to use FreeIPMI commands...

By using "./ipmi_exporter --config.file=ipmi_remote.yml --log.level=info" on the host
And pointing to a specific target as "http://localhost:9290/ipmi?target=10.104.86.41", It ask on the CLI the password of the host then returns the good values:

# HELP ipmi_chassis_power_state Current power state (1=on, 0=off).
# TYPE ipmi_chassis_power_state gauge
ipmi_chassis_power_state 1
# HELP ipmi_current_amperes Current reading in Amperes.
# TYPE ipmi_current_amperes gauge
ipmi_current_amperes{id="57",name="Current 1"} 0.8
ipmi_current_amperes{id="58",name="Current 2"} 1
# HELP ipmi_current_state Reported state of a current sensor (0=nominal, 1=warning, 2=critical).
# TYPE ipmi_current_state gauge
ipmi_current_state{id="57",name="Current 1"} 0
ipmi_current_state{id="58",name="Current 2"} 0
# HELP ipmi_fan_speed_rpm Fan speed in rotations per minute.
# TYPE ipmi_fan_speed_rpm gauge
ipmi_fan_speed_rpm{id="158",name="Fan4A"} 8040
ipmi_fan_speed_rpm{id="159",name="Fan4B"} 6120
ipmi_fan_speed_rpm{id="160",name="Fan5A"} 7200
ipmi_fan_speed_rpm{id="161",name="Fan5B"} 5160
ipmi_fan_speed_rpm{id="162",name="Fan6A"} 9720
ipmi_fan_speed_rpm{id="163",name="Fan6B"} 7920
ipmi_fan_speed_rpm{id="164",name="Fan7A"} 9840
ipmi_fan_speed_rpm{id="165",name="Fan7B"} 7560
ipmi_fan_speed_rpm{id="166",name="Fan8A"} 9840
ipmi_fan_speed_rpm{id="167",name="Fan8B"} 7680
ipmi_fan_speed_rpm{id="35",name="Fan1A"} 8040
ipmi_fan_speed_rpm{id="36",name="Fan1B"} 6120
...

If someone could explain what's wrong in my configuration and how should I correct it would be very nice :)

You want the /ipmi endpoint. I think you've essentially gotten it right, except that it's asking for what I suspect to be the sudo password? Since you're using a container, I'd say you could just run the exporter as root and get rid of the sudo stuff? If not, you'll need to setup passwordless sudo in the container.

Thank you for your answer @bitfehler !

I'm already running the container as root. Something weird i've noticed is that I can use the targets URL like: http://localhost:9290/ipmi?target=10.104.86.45 after running the exporters. But it only returns :

Unknown module "default"

It's like the exporter isn't reading the ipmi_remote.yml transmited in the docker-compose.yml volume:

 volumes:
      - /home/osadmin/monitaur/ipmi_exporter/ipmi_remote.yml:/config.yml:ro

I suspect this because I have the same error when I execute ./ipmi_exporter without specifying the --config.file=ipmi_remote.yml

Also it doesn't return this log when running the exporter:
time="2021-08-18T09:31:06Z" level=info msg="Loaded config file /config.yml" source="config.go:234"

Indeed. So it seems that the Dockerfile and the docker-compose.yml got out of sync. The container itself no longer specifies a config file, so this has to be done in the compose file. Can you add something like this:

    command: /bin/ipmi_exporter --config.file /config.yml

See also the commit I just pushed to fix this.

I agree with you, it's must be a sync issue.

After modifying the docker-compose.yml with the commit you just pushed, it returns:
monitaur-ipmi-exporter-1 | ipmi_exporter: error: unexpected /bin/ipmi_exporter, try --help

My bad, sorry. The command arguments actually get appended to the containers entrypoint, so no need to put it in there (fix):

    command: --config.file /config.yml

I think it works now, but I'm getting another error without link with the previous one :
monitaur-ipmi-exporter-1 | ts=2023-03-14T11:06:56.438Z caller=collector_chassis.go:53 level=error msg="Failed to collect chassis data" target=10.104.86.33 error="error running sudo: exec: \"sudo\": executable file not found in $PATH: "

I'm working on it, thank you @bitfehler !