GoogleCloudPlatform/compute-virtual-ethernet-linux

XDP TX queues getting stuck when the tx posted packet counters overflows beyond u32 max

Closed this issue · 2 comments

ivpr commented

We are observing an issue with 100% CPU usage without any packets being processed on some of the CPUs handling XDP tx queues

Our setup

Instance: GCE n2-standard-32
Configured queues: 4 rx, 4 tx (0-3 CPU cores are used for RX queues & XDP program and 4-7 CPU cores are used for handling XDP_TX work)
Driver Version: 1.3.4
Kernel/OS Version: Linux 6.1.0-17-cloud-amd64 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux

We are attaching a eBPF/XDP program in native mode which modifies the packets and mostly returns with XDP_TX action

Observation

Continuous 100% CPU usage is observed on CPUs 6,7 which process XDP_TX packets while there isn't much usage for CPU 4,5 which also process XDP_TX packets. ksoftirqd process for 6,7 is consuming the 100% CPU

On checking the CPU Flame Graph for these cores, we see that most of the time is spent in gve_xdp_poll and gve_clean_xdp_done.
bad_cpu_6_perf_next_hop data perf-folded

On checking the ethtool counters, we see that tx_posted_desc counter is lower than tx_completed_desc counter for 6,7

# ethtool -S ens4 | grep '\[[4-7]\]' | grep "posted\|completed" | grep tx
     tx_posted_desc[4]: 1622967499
     tx_completed_desc[4]: 1622967499
     tx_posted_desc[5]: 2328007405
     tx_completed_desc[5]: 2328007405
     tx_posted_desc[6]: 154
     tx_completed_desc[6]: 4294967274
     tx_posted_desc[7]: 170
     tx_completed_desc[7]: 4294967292

And tx_completed_desc for queues 6,7 are very close to uint32 max(2^32 = 4294967296) which indicates that tx_posted_desc could have overflown and reset which explains the low value

According to gve_clean_xdp_done code logic, this will not go inside the for loop since clean_end after overflow would be lower than tx->done and result in repoll all the time. This matches with our observation of counters (tx_posted/tx_completed) not getting incremented even is the CPU flame graph shows that time is spent in gve_clean_xdp_done

Similar logic for non XDP tx(gve_clean_tx_done) has handled this scenario by executing for loop starting from 0 till to_do which could be the reason it's not seen in non XDP flows

Hello, thanks for the report, and sorry for the slow response. We were working on this internally, and this should be fixed in the next version (landing today or tomorrow).

Thanks again for this, I did manage to get the release out today - so this should be fixed in v1.4.2. Please let me know if you run into any isuses.