APU2D4: NIC: high RX error count
Opened this issue · 8 comments
Hi,
- APU2D4, using bridge mode with two ports: enp1s0, enp2s0
- Debian Linux 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64 GNU/Linux
- TX speed dropping to tenths of KiB/s
- previous kernel versions also exhibit high RX error count, though IIRC at one kernel version last summer (?) there was good throughput available feasible. Cannot recall version nor exact date. Do not want to fix the system to a certain (old -> security issues) kernel version
- different hosts involved, no single point of error
enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether txqueuelen 1000 (Ethernet)
RX packets 193596201 bytes 57195194543 (53.2 GiB)
RX errors 260371 dropped 0 overruns 0 frame 260371
TX packets 297157067 bytes 327247392153 (304.7 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether txqueuelen 1000 (Ethernet)
RX packets 287683044 bytes 312139372971 (290.7 GiB)
RX errors 3294066 dropped 0 overruns 0 frame 3293890
TX packets 190733929 bytes 33493083338 (31.1 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet netmask 255.255.255.0 broadcast
inet6 prefixlen 64 scopeid 0x0<global>
inet6 prefixlen 64 scopeid 0x20<link>
ether txqueuelen 1000 (Ethernet)
RX packets 117361281 bytes 73685355706 (68.6 GiB)
RX errors 0 dropped 567 overruns 0 frame 0
TX packets 105770826 bytes 65548634666 (61.0 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
bridge name bridge id STP enabled interfaces
br0 8000.f641f32e0889 no enp1s0
enp2s0
[Sun Jun 14 15:48:31 2020] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Down
[Sun Jun 14 15:48:31 2020] br0: port 1(enp2s0) entered disabled state
[Sun Jun 14 15:49:08 2020] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[Sun Jun 14 15:49:08 2020] br0: port 1(enp2s0) entered blocking state
[Sun Jun 14 15:49:08 2020] br0: port 1(enp2s0) entered forwarding state
On Sun Jun 14:
for i in {1..3}; do
ethtool -s "enp${i}s0" autoneg off port tp speed 100 duplex full;
done;
- Since then no more links went down.
- RX losses continue to happen.
- Changed: patch cabling, switches. To no avail, RX errors continues to rise.
- br0 experiences moderate drops.
- RX errors equally happen w/o bridging.
Ideas anyone?
Thank you.
Happy to have found this.
Running APU2D4 with OpenWRT on Kernel 5.4.36 with the 2nd to newest bios (may 2020 release from what I recall).
I am only able to get about 300-350mbit inbound on any interface (eth0 - eth2).
I have just spent about 4 hours debugging this and have verified all the usual suspects, like (and others):
Verified its not my switch (drect link).
Verified its not firewall rules (iptables -F and also this machine hosts a VM, getting about 1-1.5 gbytes / sec between then VM and the host).
Tried disabling all the offloading ethtool params as well as flow control.
Tried multiple clients, same issue.
From my OpenWRT to a client:
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 111 MBytes 935 Mbits/sec
[ 4] 1.00-2.00 sec 112 MBytes 942 Mbits/sec
[ 4] 2.00-3.00 sec 112 MBytes 941 Mbits/sec
[ 4] 3.00-4.00 sec 112 MBytes 941 Mbits/sec
[ 4] 4.00-5.00 sec 112 MBytes 942 Mbits/sec
[ 4] 5.00-6.00 sec 112 MBytes 942 Mbits/sec
[ 4] 6.00-7.00 sec 112 MBytes 941 Mbits/sec
[ 4] 7.00-8.00 sec 112 MBytes 941 Mbits/sec
[ 4] 8.00-9.00 sec 112 MBytes 942 Mbits/sec
[ 4] 9.00-10.00 sec 112 MBytes 942 Mbits/sec
From client to OpenWRT:
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 39.5 MBytes 331 Mbits/sec 0 1.83 MBytes
[ 4] 1.00-2.00 sec 36.2 MBytes 304 Mbits/sec 0 3.00 MBytes
[ 4] 2.00-3.00 sec 35.0 MBytes 294 Mbits/sec 0 3.00 MBytes
[ 4] 3.00-4.00 sec 36.2 MBytes 304 Mbits/sec 0 3.00 MBytes
[ 4] 4.00-5.00 sec 36.2 MBytes 304 Mbits/sec 0 3.00 MBytes
[ 4] 5.00-6.00 sec 35.0 MBytes 294 Mbits/sec 0 3.00 MBytes
[ 4] 6.00-7.00 sec 36.2 MBytes 304 Mbits/sec 0 3.00 MBytes
[ 4] 7.00-8.00 sec 36.2 MBytes 304 Mbits/sec 0 3.00 MBytes
[ 4] 8.00-9.00 sec 36.2 MBytes 304 Mbits/sec 0 3.00 MBytes
[ 4] 9.00-10.00 sec 36.2 MBytes 304 Mbits/sec 0 3.00 MBytes
Anyone?
Welp ain't that annoying...
Turns out I had a bios about 5 versions back... Just upgraded to the latest from June 30 2020 and presto, ~950mbit both directions.
@anonymous-one Where did you get a 2020-06-30 release from?
Latest as of https://pcengines.github.io/ is 2020-06-28 apu2 v4.12.0.2.
Holy crap! Just flashed 2020-06-28 apu2 v4.12.0.2. It's up again. Not to 100% throughput but ~75% That's a substantial improvement. Thanks @anonymous-one for motivating me to just update the fw once more.
Yet there is still room for improvement, I'd like to see rather 95% throughput or above. As it once was.
I forgot to do a copy but i believe the bios version where I was having the RX issues was something along the lines of v4.11.0.5?
Regardless I am now getting roughly 950mbit TX and RX regardless of client location (direct / via switch) on a 1gbit link.
I had to raise my ring buffers a little BTW, ethtool -G ethX rx XXXX or once in a while (frequently) I had some overrun packets.
After I set the rx ring buffers to 2048, zero overrun packets.
@anonymous-one Thank you for your feedback.
I had a massive RX error count. Trying your suggestion now.
But since the fw upgrade yesterday I got no RX error count increase so far. Usually the crept up fairly quickly. So far, I'm getting consistently 75% performance, which is a huge improvement, yet to be increased.
Did your board show overrun packets even after the latest fw upgrade?
Edit: "overruns" are distinct from RX errors
I had small bursts (100-200 at a time?) of overrun packets when pinning the link (eg: 950mbit).
And yep, rx errors vs overruns are different. Regardless, my understanding is the overruns are not desirable either although not as bad as the straight up rx errors.
@anonymous-one Thank you for your feedback.
I had a massive RX error count. Trying your suggestion now.
But since the fw upgrade yesterday I got no RX error count increase so far. Usually the crept up fairly quickly. So far, I'm getting consistently 75% performance, which is a huge improvement, yet to be increased.
Did your board show overrun packets even after the latest fw upgrade?
Edit: "overruns" are distinct from RX errors
It seems tha I have some identical issue as you guys, let me share with you my set-up
Actually, my main computer has a 10g nic and set to auto negociation/full duplex (10Gbs speed) connected to a Mikrotik switch with 10g ports and my apu2d4 :
When performing an iperf test to and from my computer to my apu2d4 :
From my desktop to apu2d4 :
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 41.0 MBytes 344 Mbits/sec 587 15.6 KBytes
[ 5] 1.00-2.00 sec 40.4 MBytes 339 Mbits/sec 565 15.6 KBytes
[ 5] 2.00-3.00 sec 42.9 MBytes 360 Mbits/sec 714 14.1 KBytes
[ 5] 3.00-4.00 sec 40.3 MBytes 338 Mbits/sec 668 14.1 KBytes
[ 5] 4.00-5.00 sec 39.6 MBytes 333 Mbits/sec 540 14.1 KBytes
[ 5] 5.00-6.00 sec 42.3 MBytes 354 Mbits/sec 728 14.1 KBytes
[ 5] 6.00-7.00 sec 40.0 MBytes 336 Mbits/sec 647 12.7 KBytes
[ 5] 7.00-8.00 sec 43.1 MBytes 362 Mbits/sec 622 21.2 KBytes
[ 5] 8.00-9.00 sec 45.1 MBytes 378 Mbits/sec 780 14.1 KBytes
[ 5] 9.00-10.00 sec 45.2 MBytes 379 Mbits/sec 892 19.8 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 420 MBytes 352 Mbits/sec 6743 sender
[ 5] 0.00-10.01 sec 420 MBytes 352 Mbits/sec receiver
On my apu2d4, I have a reception transmission error increase :
green0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether 00:0d:b9:52:d6:39 txqueuelen 1000 (Ethernet)
RX packets 3069435 bytes 3895959607 (3.6 GiB)
RX errors 14286 dropped 0 overruns 0 frame 7143
TX packets 2720069 bytes 3726867247 (3.4 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xd0400000-d041ffff
If I force my desktop NIC to 1Gb and retry again :
From my desktop to my apu2d4 :
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 115 MBytes 962 Mbits/sec 0 471 KBytes
[ 5] 1.00-2.00 sec 112 MBytes 938 Mbits/sec 0 516 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 941 Mbits/sec 0 539 KBytes
[ 5] 3.00-4.00 sec 113 MBytes 944 Mbits/sec 0 539 KBytes
[ 5] 4.00-5.00 sec 113 MBytes 948 Mbits/sec 0 539 KBytes
[ 5] 5.00-6.00 sec 112 MBytes 937 Mbits/sec 0 566 KBytes
[ 5] 6.00-7.00 sec 113 MBytes 944 Mbits/sec 0 566 KBytes
[ 5] 7.00-8.00 sec 112 MBytes 941 Mbits/sec 0 566 KBytes
[ 5] 8.00-9.00 sec 113 MBytes 944 Mbits/sec 0 566 KBytes
[ 5] 9.00-10.00 sec 113 MBytes 945 Mbits/sec 0 618 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 944 Mbits/sec 0 sender
[ 5] 0.00-10.01 sec 1.10 GBytes 941 Mbits/sec receiver```
Regarding network error :
```[root@i264 ~]# ifconfig -a green0
green0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether 00:0d:b9:52:d6:39 txqueuelen 1000 (Ethernet)
RX packets 3884065 bytes 5127179752 (4.7 GiB)
RX errors 14286 dropped 0 overruns 0 frame 7143
TX packets 2783145 bytes 3736949748 (3.4 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xd0400000-d041ffff
I am still not sure to understand, and It may be not related to this issue, the switch port where the apu2d4 is connected is in autonegociation, still, I tried to set the speed to 1G, once validated, the apu2d4 was unreachable.
NIkos