pcengines/apu2-documentation

APU2D4: NIC: high RX error count

Opened this issue · 8 comments

Hi,

  • APU2D4, using bridge mode with two ports: enp1s0, enp2s0
  • Debian Linux 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64 GNU/Linux
  • TX speed dropping to tenths of KiB/s
  • previous kernel versions also exhibit high RX error count, though IIRC at one kernel version last summer (?) there was good throughput available feasible. Cannot recall version nor exact date. Do not want to fix the system to a certain (old -> security issues) kernel version
  • different hosts involved, no single point of error
enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether   txqueuelen 1000  (Ethernet)
        RX packets 193596201  bytes 57195194543 (53.2 GiB)
        RX errors 260371  dropped 0  overruns 0  frame 260371
        TX packets 297157067  bytes 327247392153 (304.7 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether   txqueuelen 1000  (Ethernet)
        RX packets 287683044  bytes 312139372971 (290.7 GiB)
        RX errors 3294066  dropped 0  overruns 0  frame 3293890
        TX packets 190733929  bytes 33493083338 (31.1 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet   netmask 255.255.255.0  broadcast 
        inet6   prefixlen 64  scopeid 0x0<global>
        inet6   prefixlen 64  scopeid 0x20<link>
        ether   txqueuelen 1000  (Ethernet)
        RX packets 117361281  bytes 73685355706 (68.6 GiB)
        RX errors 0  dropped 567  overruns 0  frame 0
        TX packets 105770826  bytes 65548634666 (61.0 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

bridge name     bridge id               STP enabled     interfaces
br0             8000.f641f32e0889       no              enp1s0
                                                        enp2s0
[Sun Jun 14 15:48:31 2020] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Down
[Sun Jun 14 15:48:31 2020] br0: port 1(enp2s0) entered disabled state
[Sun Jun 14 15:49:08 2020] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[Sun Jun 14 15:49:08 2020] br0: port 1(enp2s0) entered blocking state
[Sun Jun 14 15:49:08 2020] br0: port 1(enp2s0) entered forwarding state

On Sun Jun 14:

for i in {1..3}; do
  ethtool -s "enp${i}s0" autoneg off port tp speed 100 duplex full;
done;
  • Since then no more links went down.
  • RX losses continue to happen.
  • Changed: patch cabling, switches. To no avail, RX errors continues to rise.
  • br0 experiences moderate drops.
  • RX errors equally happen w/o bridging.

Ideas anyone?

Thank you.

Happy to have found this.

Running APU2D4 with OpenWRT on Kernel 5.4.36 with the 2nd to newest bios (may 2020 release from what I recall).

I am only able to get about 300-350mbit inbound on any interface (eth0 - eth2).

I have just spent about 4 hours debugging this and have verified all the usual suspects, like (and others):

Verified its not my switch (drect link).
Verified its not firewall rules (iptables -F and also this machine hosts a VM, getting about 1-1.5 gbytes / sec between then VM and the host).
Tried disabling all the offloading ethtool params as well as flow control.
Tried multiple clients, same issue.

From my OpenWRT to a client:

[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   111 MBytes   935 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   942 Mbits/sec
[  4]   2.00-3.00   sec   112 MBytes   941 Mbits/sec
[  4]   3.00-4.00   sec   112 MBytes   941 Mbits/sec
[  4]   4.00-5.00   sec   112 MBytes   942 Mbits/sec
[  4]   5.00-6.00   sec   112 MBytes   942 Mbits/sec
[  4]   6.00-7.00   sec   112 MBytes   941 Mbits/sec
[  4]   7.00-8.00   sec   112 MBytes   941 Mbits/sec
[  4]   8.00-9.00   sec   112 MBytes   942 Mbits/sec
[  4]   9.00-10.00  sec   112 MBytes   942 Mbits/sec

From client to OpenWRT:

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  39.5 MBytes   331 Mbits/sec    0   1.83 MBytes
[  4]   1.00-2.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   2.00-3.00   sec  35.0 MBytes   294 Mbits/sec    0   3.00 MBytes
[  4]   3.00-4.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   4.00-5.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   5.00-6.00   sec  35.0 MBytes   294 Mbits/sec    0   3.00 MBytes
[  4]   6.00-7.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   7.00-8.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   8.00-9.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   9.00-10.00  sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes

Anyone?

Welp ain't that annoying...

Turns out I had a bios about 5 versions back... Just upgraded to the latest from June 30 2020 and presto, ~950mbit both directions.

@anonymous-one Where did you get a 2020-06-30 release from?

Latest as of https://pcengines.github.io/ is 2020-06-28 apu2 v4.12.0.2.

Holy crap! Just flashed 2020-06-28 apu2 v4.12.0.2. It's up again. Not to 100% throughput but ~75% That's a substantial improvement. Thanks @anonymous-one for motivating me to just update the fw once more.

Yet there is still room for improvement, I'd like to see rather 95% throughput or above. As it once was.

I forgot to do a copy but i believe the bios version where I was having the RX issues was something along the lines of v4.11.0.5?

Regardless I am now getting roughly 950mbit TX and RX regardless of client location (direct / via switch) on a 1gbit link.

I had to raise my ring buffers a little BTW, ethtool -G ethX rx XXXX or once in a while (frequently) I had some overrun packets.

After I set the rx ring buffers to 2048, zero overrun packets.

@anonymous-one Thank you for your feedback.

I had a massive RX error count. Trying your suggestion now.

But since the fw upgrade yesterday I got no RX error count increase so far. Usually the crept up fairly quickly. So far, I'm getting consistently 75% performance, which is a huge improvement, yet to be increased.

Did your board show overrun packets even after the latest fw upgrade?

Edit: "overruns" are distinct from RX errors

I had small bursts (100-200 at a time?) of overrun packets when pinning the link (eg: 950mbit).

And yep, rx errors vs overruns are different. Regardless, my understanding is the overruns are not desirable either although not as bad as the straight up rx errors.

@anonymous-one Thank you for your feedback.

I had a massive RX error count. Trying your suggestion now.

But since the fw upgrade yesterday I got no RX error count increase so far. Usually the crept up fairly quickly. So far, I'm getting consistently 75% performance, which is a huge improvement, yet to be increased.

Did your board show overrun packets even after the latest fw upgrade?

Edit: "overruns" are distinct from RX errors

It seems tha I have some identical issue as you guys, let me share with you my set-up

Actually, my main computer has a 10g nic and set to auto negociation/full duplex (10Gbs speed) connected to a Mikrotik switch with 10g ports and my apu2d4 :

When performing an iperf test to and from my computer to my apu2d4 :

From my desktop to apu2d4 :

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  41.0 MBytes   344 Mbits/sec  587   15.6 KBytes       
[  5]   1.00-2.00   sec  40.4 MBytes   339 Mbits/sec  565   15.6 KBytes       
[  5]   2.00-3.00   sec  42.9 MBytes   360 Mbits/sec  714   14.1 KBytes       
[  5]   3.00-4.00   sec  40.3 MBytes   338 Mbits/sec  668   14.1 KBytes       
[  5]   4.00-5.00   sec  39.6 MBytes   333 Mbits/sec  540   14.1 KBytes       
[  5]   5.00-6.00   sec  42.3 MBytes   354 Mbits/sec  728   14.1 KBytes       
[  5]   6.00-7.00   sec  40.0 MBytes   336 Mbits/sec  647   12.7 KBytes       
[  5]   7.00-8.00   sec  43.1 MBytes   362 Mbits/sec  622   21.2 KBytes       
[  5]   8.00-9.00   sec  45.1 MBytes   378 Mbits/sec  780   14.1 KBytes       
[  5]   9.00-10.00  sec  45.2 MBytes   379 Mbits/sec  892   19.8 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   420 MBytes   352 Mbits/sec  6743             sender
[  5]   0.00-10.01  sec   420 MBytes   352 Mbits/sec                  receiver 

On my apu2d4, I have a reception transmission error increase :

green0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 00:0d:b9:52:d6:39  txqueuelen 1000  (Ethernet)
        RX packets 3069435  bytes 3895959607 (3.6 GiB)
        RX errors 14286  dropped 0  overruns 0  frame 7143
        TX packets 2720069  bytes 3726867247 (3.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xd0400000-d041ffff

If I force my desktop NIC to 1Gb and retry again :

From my desktop to my apu2d4 :

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   115 MBytes   962 Mbits/sec    0    471 KBytes       
[  5]   1.00-2.00   sec   112 MBytes   938 Mbits/sec    0    516 KBytes       
[  5]   2.00-3.00   sec   112 MBytes   941 Mbits/sec    0    539 KBytes       
[  5]   3.00-4.00   sec   113 MBytes   944 Mbits/sec    0    539 KBytes       
[  5]   4.00-5.00   sec   113 MBytes   948 Mbits/sec    0    539 KBytes       
[  5]   5.00-6.00   sec   112 MBytes   937 Mbits/sec    0    566 KBytes       
[  5]   6.00-7.00   sec   113 MBytes   944 Mbits/sec    0    566 KBytes       
[  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec    0    566 KBytes       
[  5]   8.00-9.00   sec   113 MBytes   944 Mbits/sec    0    566 KBytes       
[  5]   9.00-10.00  sec   113 MBytes   945 Mbits/sec    0    618 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   944 Mbits/sec    0             sender
[  5]   0.00-10.01  sec  1.10 GBytes   941 Mbits/sec                  receiver```

Regarding network error : 

```[root@i264 ~]# ifconfig -a green0
green0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 00:0d:b9:52:d6:39  txqueuelen 1000  (Ethernet)
        RX packets 3884065  bytes 5127179752 (4.7 GiB)
        RX errors 14286  dropped 0  overruns 0  frame 7143
        TX packets 2783145  bytes 3736949748 (3.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xd0400000-d041ffff 

I am still not sure to understand, and It may be not related to this issue, the switch port where the apu2d4 is connected is in autonegociation, still, I tried to set the speed to 1G, once validated, the apu2d4 was unreachable.

NIkos