sberk42/fritzbox_exporter

RPi in Docker "Can not get metric values"

Opened this issue · 6 comments

Hey there!

I have been running into an issue with docker on a raspberrypi and the self built image from this repo.

My (v3.7) compose file includes this service definiton:

fritzbox-prometheus-exporter:
    hostname: fritzbox-prometheus-exporter
    build:
      context: ../fritzbox_exporter
      dockerfile: Dockerfile
    container_name: fritzbox-prometheus-exporter
    # for dns issues like "dial tcp: lookup fritz.box on 127.0.0.11:53: no such host"
    # uncomment and fill the following line:
    dns: 192.168.178.1
    ports:
      - "9042:9042"
    network_mode: "host"
    #expose:
    #  - "9042"
    restart: unless-stopped
    environment:
      USERNAME: removed
      PASSWORD: removed
      GATEWAY_URL: http://removed:49000
      LISTEN_ADDRESS: 0.0.0.0:9042

The build context is a pull of this repo (master).

Fritzbox: 7590 7.29
The software is running on a RaspberryPi 3B
Docker version 20.10.17, build 100c701 (CE)

os-release:

NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal`
kernel:
`5.4.0-1065-raspi

After a while of running just fine, the log starts reporting these errors and some data (like CPU utilization and temps) is not reported anymore:

root@raspberrypi:~# docker logs --details fritzbox-prometheus-exporter
 time="2022-07-02T19:18:25Z" level=info msg="metrics available at http://0.0.0.0:9042/metrics"
 time="2022-07-02T19:18:25Z" level=info msg="readyness check available at http://0.0.0.0:9042/ready"
 time="2022-07-02T19:18:25Z" level=info msg="liveness check available at http://0.0.0.0:9042/live"
 time="2022-07-02T19:18:26Z" level=info msg="services loaded"
 time="2022-07-03T02:32:54Z" level=error msg="Can not get metric values for data.drain.*.actPerc: hash '' has no element 'data'"
 time="2022-07-03T02:32:55Z" level=error msg="Can not get metric values for data.drain.*.lan.*.class: hash '' has no element 'data'"
 time="2022-07-03T02:32:55Z" level=error msg="Can not get metric values for data.cputemp.series.0.-1: hash '' has no element 'data'"
 time="2022-07-03T02:32:56Z" level=error msg="Can not get metric values for data.cpuutil.series.0.-1: hash '' has no element 'data'"
 time="2022-07-03T02:32:57Z" level=error msg="Can not get metric values for data.ramusage.series.0.-1: hash '' has no element 'data'"
 time="2022-07-03T02:32:57Z" level=error msg="Can not get metric values for data.ramusage.series.1.-1: hash '' has no element 'data'"
 time="2022-07-03T02:32:58Z" level=error msg="Can not get metric values for data.ramusage.series.2.-1: hash '' has no element 'data'"
 time="2022-07-03T02:32:58Z" level=error msg="Can not get metric values for data.usbOverview.devices.*.partitions.0.totalStorageInBytes: hash '' has no element 'data'"
 time="2022-07-03T02:32:59Z" level=error msg="Can not get metric values for data.usbOverview.devices.*.partitions.0.usedStorageInBytes: hash '' has no element 'data'"
 time="2022-07-03T02:33:50Z" level=error msg="Can not get metric values for data.drain.*.actPerc: hash '' has no element 'data'"

Restarting the container fixes the issue. All the "lua" stats are also back in the exporter output. Any idea what could cause this?

Thank you!

I've experience similar issues, when a fritzbox OS update is applied, afterwards I need to restart the exporter to get lua to work again. so I guess it is somehow related to when the exporter is unable to reach the fritzbox.

however so far I did not manage to reproduce the issue (without upgrade), to find out where it is coming from. do you have a way to reliable reproduce it? I could then add some more debug output to the lua_client to see where the issue is coming from.

I have stumbled upon this issue when playing around with the Wireguard-feature currently available via "Fritz! Labor". I have made the following observations:

  • Losing IP connectivity to the Fritzbox: no problem after connectivity has been restored
  • Firmware-Update: "hash '' has no element 'data'"
  • Reboot: "hash '' has no element 'data'"

So my assumption would be that renewing the SID after the Fritzbox has been rebooted doesn't work as expected. Fritzbox log keeps telling exactly this: "ungültige Sitzungskennung"

do you know the http code you received?
SID renewal was only done if 403 (forbidden) was returned - maybe the code is different after reboot.

I just pushed a change to always renew the SID in case of an error - maybe this helps in this case

Sorry for taking so long to get back to this topic

No joy, still "hash '' has no element 'data'"

Does the http code get logged somewhere?

strange, I checked again but it looks this can only happen if the call was successful, but somehow the returned json was not as expected

I now added support to do a reauthentication whenever something goes wrong with the metric collection

let's see what now happens

I just found some time to test: Problem still persists.