librenms/librenms-agent

32bit restriction with application wireguard?

efelon opened this issue · 15 comments

The problem

I have a long running wireguard on a (still 32bit) raspberry pi 4 Debian buster:
Linux 5.10.103-v7l+ #1529 SMP Tue Mar 8 12:24:00 GMT 2022 armv7l GNU/Linux
LibreNMS is running an 32bit Debian buster as well.

Some of the peers have high values for send/receive transfer counter. e.g.:

peer: ***  mobile_mw
  preshared key: (hidden)
  endpoint: *:*
  allowed ips: *
  latest handshake: 1 minute, 56 seconds ago
  transfer: 761.16 MiB received, 3.83 GiB sent

Althoug the absolute byte values output by wg show all dump are not at the limit:

4113015344 (value by wg)
4294967296 (2^32)

the graphs in the web ui show nand for these values. As a test I modified wireguard.py as follows, which "fixes" the display problem in the web ui:

[...]
        bytes_rcvd = long(line_parsed[6]) / 1000
        bytes_sent = long(line_parsed[7]) / 1000
[...]

I'm pretty sure, the 32bit OS/php/python is the problem somehow, but wanted to report this behavior in case there is another solution other than "32bit is not supported any more".

Opened here by request from librenms/librenms#14688

Output of ./validate.php

===========================================
Component | Version
--------- | -------
LibreNMS  | 22.11.0 (2022-11-24T07:01:26+01:00)
DB Schema | 2022_08_15_084507_add_rrd_type_to_wireless_sensors_table (248)
PHP       | 8.1.12
Python    | 3.7.3
Database  | MariaDB 10.3.36-MariaDB-0+deb10u2
RRDTool   | 1.7.1
SNMP      | 5.7.3
===========================================

[OK]    Composer Version: 2.4.4
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[OK]    Database Schema is current
[OK]    SQL Server meets minimum requirements
[OK]    lower_case_table_names is enabled
[OK]    MySQL engine is optimal
[OK]
[OK]    Database schema correct
[OK]    MySQl and PHP time match
[OK]    Active pollers found
[OK]    Dispatcher Service not detected
[OK]    Locks are functional
[OK]    Python poller wrapper is polling
[WARN]  Using database for locking, you should set CACHE_DRIVER=redis
[OK]    rrd_dir is writable
[OK]    rrdtool version ok

What was the last working version of LibreNMS?

No response

Anything in the logs that might be useful for us?

No response

@efelon As a test, would you remove the "long" casting and, assuming you're running python3 on you Pi, would you print the output of python3 -c "import sys; print(sys.maxsize)" here? If you're running python2, I believe the command is python -c "import sys; print(sys.maxint)"

Hello @bnerickson , my default is python2 for this installation, but I have python 3 also installed:

# python3 -c "import sys; print(sys.maxsize)"
2147483647
# python2 -c "import sys; print(sys.maxint)"
2147483647

One of the devices has these values now:

{"mobile_mw": {"minutes_since_last_handshake": 1, "bytes_rcvd": 999921456, "bytes_sent": 5157117812}

wg dump:

wg0	**=	**=	*.*.*.*:39881	10.5.0.2/32,fd23:42::2/128	1670059337	1001515004	5203107064	off

looking like this (with the original wireguard.py):
grafik

@efelon thanks. Would you paste the relevant output when running the /etc/snmp/wireguard.py command under two conditions?:

  1. With bytes_sent/bytes_rcvd casted to the long type without division:
[...]
        bytes_rcvd = long(line_parsed[6])
        bytes_sent = long(line_parsed[7])
[...]
  1. With bytes_sent/bytes_rcvd original:
[...]
        bytes_rcvd = int(line_parsed[6])
        bytes_sent = int(line_parsed[7])
[...]

@bnerickson, of course. But the output is the same:

# int()
"mobile_mw": {"minutes_since_last_handshake": 0, "bytes_sent": 5790460872, "bytes_rcvd": 1030401704},
# long()
"mobile_mw": {"minutes_since_last_handshake": 0, "bytes_sent": 5790460904, "bytes_rcvd": 1030401704},

It might be unrelated since the max value I supplied is larger than yours, but I screwed up by setting a maximum for the bytes received/sent here: https://github.com/librenms/librenms/blob/49abee372268d2d49448f9557e00b6cb8a54521e/includes/polling/applications/wireguard.inc.php#L24

Can you change those two lines on your LibreNMS install to the following and report if you start seeing data on the graph without the divide by 1000 and long casting?

    ->addDataset('bytes_rcvd', 'DERIVE', 0)
    ->addDataset('bytes_sent', 'DERIVE', 0)

You might have to delete the RRD and re-poll for the changes to take effect. In either case, I need to submit a PR with those changes.

The high values still don't show up. I deleted the rrd files as requested and waited several polls.

Thanks. On your LibreNMS server (I assume it is separate from your RaspPi), what is the wireguard-specific output when you run the following?:

snmpwalk -v2c -c <snmp_community> <rasppi_ip_address> NET-SNMP-EXTEND-MIB::nsExtendOutput2Table | grep wireguard

Replacing <snmp_community> and <rasppi_ip_address> with your snmp community and RaspPi IP address or hostname respectively.

I use only v3 so I changed the snmpwalk command accordingly, but the output is most likely not what you expect:

[...]
Did not find 'nsExtensions' in module NET-SNMP-AGENT-MIB (/usr/share/snmp/mibs/NET-SNMP-EXTEND-MIB.txt)
Did not find 'DisplayString' in module #-1 (/usr/share/snmp/mibs/NET-SNMP-EXTEND-MIB.txt)
Did not find 'RowStatus' in module #-1 (/usr/share/snmp/mibs/NET-SNMP-EXTEND-MIB.txt)
Did not find 'StorageType' in module #-1 (/usr/share/snmp/mibs/NET-SNMP-EXTEND-MIB.txt)
Unlinked OID in NET-SNMP-EXTEND-MIB: nsExtendGroups ::= { nsExtensions 3 }
Undefined identifier: nsExtensions near line 39 of /usr/share/snmp/mibs/NET-SNMP-EXTEND-MIB.txt
Unlinked OID in NET-SNMP-EXTEND-MIB: nsExtendObjects ::= { nsExtensions 2 }
Undefined identifier: nsExtensions near line 38 of /usr/share/snmp/mibs/NET-SNMP-EXTEND-MIB.txt
Unlinked OID in NET-SNMP-EXTEND-MIB: netSnmpExtendMIB ::= { nsExtensions 1 }
Undefined identifier: nsExtensions near line 19 of /usr/share/snmp/mibs/NET-SNMP-EXTEND-MIB.txt
NET-SNMP-EXTEND-MIB::nsExtendOutput2Table: Unknown Object Identifier

Ah, yes. Let's try running the command with the OID:

snmpwalk -v2c -c <snmp_community> <rasppi_ip_address> .1.3.6.1.4.1.8072.1.3.2.4 | grep wireguard

I grep for wg0 as there is no "wireguard" in the output, and skipped the other clients:

./walk.sh .1.3.6.1.4.1.8072.1.3.2.4 | grep wg0
iso.3.6.1.4.1.8072.1.3.2.4.1.2.9.119.105.114.101.103.117.97.114.100.1 = STRING: "{"errorString": "", "error": 0, "version": 1, "data": {"wg0": {"mobile_mw": {"minutes_since_last_handshake": 0, "bytes_rcvd": 1055315724, "bytes_sent": 6103999272}, [...]

Thanks. Is your LibreNMS installation on a 32-bit or 64-bit kernel?

Hm. I tried to reproduce your scenario by creating a dummy sample_guest with 1055315724 bytes_rcvd and 6103999272 bytes_sent, but LibreNMS graphed that successfully:

wg0

Thanks. Is your LibreNMS installation on a 32-bit or 64-bit kernel?

Both (wireguard pi and LibreNMS pi) are actually on 32-bit kernel at the moment. Coincidently I'm about to move the LibreNMS instance to a 64bit installation in the next few days. I will report back afterwards.

Sounds good. Hope that fixes the issue. My LibreNMS is on a 64-bit kernel FWIW.

I have moved my LibreNMS installation over to a 64bit system, and the values are back. Wireguard still runs on the 32bit system:
image

One thing to note. I couldn't transfer the rrd files directly. When trying to open the "32bit" rrd files (managed with rrdtool version 1.7.1) i got the following error (rrdtool version 1.7.2): ERROR: reached EOF while loading header rrd->ds_def. This error also showed up in the librenms WebUI instead of the graph.
I had to rrdtool dump every file to xml, transfer those over to the new machine and rrdtool restore them back to rrd.