soypat/cyw43439

invalid csum error, server stops

scottfeldman opened this issue · 9 comments

Running examples/tcpserver, hit this:

sfeldma@nuc:~/work/cyw43439$ tinygo flash -monitor -target pico -stack-size=32kb -size short ./examples/tcpserver/
   code    data     bss |   flash     ram
 428520   17084    3824 |  445604   20908
Connected to /dev/ttyACM0. Press Ctrl-C to exit.
starting program
time=1970-01-01T00:00:02.002Z level=INFO msg=Init:start
time=1970-01-01T00:00:03.556Z level=INFO msg=Init:done took=1.552879s
time=1970-01-01T00:00:03.557Z level=INFO msg=joinWpa2 ssid=test len(pass)=8
time=1970-01-01T00:00:07.002Z level=INFO msg=rxEvent event=AUTH status=0 reason=0 flags=0 dev.linkstate=1
time=1970-01-01T00:00:07.003Z level=INFO msg=rxEvent event=SET_SSID status=0 reason=0 flags=0 dev.linkstate=2



MAC: 28:cd:c1:01:13:7c
dhcp ongoing...
time=1970-01-01T00:00:07.009Z level=INFO msg=DHCP:tx msg=Discover
time=1970-01-01T00:00:07.059Z level=INFO msg=DHCP:rx msg=Offer
time=1970-01-01T00:00:07.060Z level=INFO msg=DHCP:tx msg=Request
time=1970-01-01T00:00:07.110Z level=INFO msg=DHCP:rx msg=Ack
DHCP complete IP: 10.0.0.122
start listening on: 10.0.0.122:1234
time=1970-01-01T00:00:08.028Z level=INFO msg=TCP:rx-statechange port=1234 old=SynSent new=SynRcvd rxflags=[SYN]
time=1970-01-01T00:00:08.079Z level=INFO msg=TCP:rx-statechange port=1234 old=SynRcvd new=Established rxflags=[ACK]
time=1970-01-01T00:10:31.560Z level=ERROR msg=Stack.RecvEth err="invalid TCP/UDP checksum"
time=1970-01-01T00:10:31.561Z level=ERROR msg=tryPoll:rx plen=0 err="invalid TCP/UDP checksum"
poll error: invalid TCP/UDP checksum
time=1970-01-01T00:10:31.818Z level=ERROR msg=Stack.RecvEth err="invalid TCP/UDP checksum"
time=1970-01-01T00:10:31.818Z level=ERROR msg=tryPoll:rx plen=0 err="invalid TCP/UDP checksum"
poll error: invalid TCP/UDP checksum
time=1970-01-01T00:10:32.432Z level=ERROR msg=Stack.RecvEth err="invalid TCP/UDP checksum"
time=1970-01-01T00:10:32.433Z level=ERROR msg=tryPoll:rx plen=0 err="invalid TCP/UDP checksum"
poll error: invalid TCP/UDP checksum
time=1970-01-01T00:10:33.557Z level=ERROR msg=Stack.RecvEth err="invalid TCP/UDP checksum"
time=1970-01-01T00:10:33.558Z level=ERROR msg=tryPoll:rx plen=0 err="invalid TCP/UDP checksum"
poll error: invalid TCP/UDP checksum
time=1970-01-01T00:10:35.908Z level=ERROR msg=Stack.RecvEth err="invalid TCP/UDP checksum"
time=1970-01-01T00:10:35.909Z level=ERROR msg=tryPoll:rx plen=0 err="invalid TCP/UDP checksum"
poll error: invalid TCP/UDP checksum
time=1970-01-01T00:10:40.505Z level=ERROR msg=Stack.RecvEth err="invalid TCP/UDP checksum"
time=1970-01-01T00:10:40.506Z level=ERROR msg=tryPoll:rx plen=0 err="invalid TCP/UDP checksum"
poll error: invalid TCP/UDP checksum
time=1970-01-01T00:10:49.391Z level=ERROR msg=Stack.RecvEth err="invalid TCP/UDP checksum"
time=1970-01-01T00:10:49.392Z level=ERROR msg=tryPoll:rx plen=0 err="invalid TCP/UDP checksum"
poll error: invalid TCP/UDP checksum
time=1970-01-01T00:11:07.315Z level=ERROR msg=Stack.RecvEth err="invalid TCP/UDP checksum"
time=1970-01-01T00:11:07.316Z level=ERROR msg=tryPoll:rx plen=0 err="invalid TCP/UDP checksum"
poll error: invalid TCP/UDP checksum
time=1970-01-01T00:11:44.285Z level=ERROR msg=Stack.RecvEth err="invalid TCP/UDP checksum"
time=1970-01-01T00:11:44.286Z level=ERROR msg=tryPoll:rx plen=0 err="invalid TCP/UDP checksum"
poll error: invalid TCP/UDP checksum

I have a pcap file captured on the client side I'll attach.

soypat commented

So it seems like packet number 3651 is the problematic packet. Will add a test later this week and see if I catch anything. Leaving the packet here as a hexdump:

0000   28 cd c1 01 13 7c 88 ae dd 0a 70 92 08 00 45 00
0010   00 2d b0 3a 40 00 40 06 75 59 0a 00 00 be 0a 00
0020   00 7a c7 ce 04 d2 2a 67 58 17 00 00 0d 53 50 18
0030   fa 4b ff ff 00 00 68 65 6c 6c 6f
soypat commented

@scottfeldman It seems like the mistake is on the client's side, not seqs according to all online packet analyzers I've tried and wireshark.

image

I don't think the client is making a mistake. Wireshark can't see the final csum because the tcp csum is offloaded to the hw NIC on the client side at 10.0.0.190.

sfeldma@nuc:~$ ip a show dev eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 88:ae:dd:0a:70:92 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
    inet 10.0.0.190/24 brd 10.0.0.255 scope global dynamic noprefixroute eno1
       valid_lft 4578sec preferred_lft 4578sec
    inet6 fe80::59c6:7ccb:127b:5b75/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
sfeldma@nuc:~$ sudo ethtool -k eno1 | head -n20
Features for eno1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on                         <<<<<<<<<<<<<<<<< right here
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: off
        tx-scatter-gather: off
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: off
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
soypat commented

OK, so taking a closer look the first TCP checksum error occurs at timestamp=10:31.560Z, which occurs 10 minutes 24 seconds after the DHCP exchange (624s) which corresponds to wireshark timestamp 624-20=604s- there's not much happening at that point- connection proceeds normally up to the 644s wireshark timestamp, where the client's messages stop being acknowledged by the server.

Interestingly, up to this point all previous checksums are correct as marked by Wireshark- which seems to indicate the TCP checksum is not being offloaded. The packet which fails to be acked by the seqs server has the checksum=0xffff. Wireshark tells us that the correct checksum is 0.
image

I've added a test for this package in seqs which does yield the correct checksum.

If the issue is seqs for the program above then it would seem I'm looking at an incorrect packet in the stream- but I can't seem to find any other anomaly in between packets other than the mentioned packet.

also, tx-checksum-ipv4: on sounds like it may pertain to the IP header checksum, not the TCP checksum.

Oh, interesting. I'm not sure what's going on then.

Let's keep this one open for awhile in case it pops up again.

Final period for comments before closing!