ntop/PF_RING

100% Pkt Drop with Multiple RSS

Closed this issue · 2 comments

I am trying to use n2disk and pf_ring zc to record data over a 100Gbps link. When setting RSS to 1 things seem to work ok. When setting RSS to >1, in this case 4, there is 100% packet loss. Not sure if I have something misconfigured or setup incorrectly.

Below is info about the system, environment, and commands/outputs when RSS is set to 4.

Tool Versions:

n2disk v.3.6.240222 (r5303)
PF_RING v.8.6.1
Rocky Linux release 9.3 (Blue Onyx)
5.14.0-362.8.1.el9_3.x86_64

Hardware:

Processor: AMD EPYC 9474F
Nic: Mellanox MCX516A-CCAT ConnectX5 EN

n2disk conf:

--interface=mlx:mlx5_2@[0-3]
--dump-directory=/data/n2disk/pcap4/1
--dump-directory=/data/n2disk/pcap4/2
--dump-directory=/data/n2disk/pcap4/3
--dump-directory=/data/n2disk/pcap4/4
--timeline-dir=/data/n2disk/timeline4/1
--timeline-dir=/data/n2disk/timeline4/2
--timeline-dir=/data/n2disk/timeline4/3
--timeline-dir=/data/n2disk/timeline4/4
--disk-limit=80%
--max-file-len=10240
--buffer-len=8192
--index
--reader-cpu-affinity=1
--reader-cpu-affinity=2
--reader-cpu-affinity=3
--reader-cpu-affinity=4
--writer-cpu-affinity=0
--writer-cpu-affinity=0
--writer-cpu-affinity=0
--writer-cpu-affinity=0
--compressor-cpu-affinity=5,6
--compressor-cpu-affinity=7,8
--compressor-cpu-affinity=9,10
--compressor-cpu-affinity=11,12
--verbose

PF_RING configuration:

[root@localhost]# pf_ringcfg --configure-driver mlx --rss-queues 4
[>] Installing PF_RING.ko
[>] Mellanox OFED/EN already installed
[>] Configuring PF_RING
[>] Configuring hugepages
[>] Detecting interfaces using mlx5_core
enp129s0f0np0
enp129s0f1np1
enp65s0f0np0
enp65s0f1np1
[>] Configuring mlx driver with 4 RSS queues
[>] Restarting PF_RING
[>] Configuration completed
[root@localhost]# pf_ringcfg --list-interfaces
Name: enp6s0f0np0          Driver: bnxt_en    RSS:     8    [Linux Driver] 
Name: enp6s0f1np1          Driver: bnxt_en    RSS:     8    [Linux Driver] 
Name: enp129s0f0np0        Driver: mlx5_core  RSS:     4    [Running ZC]   
Name: enp129s0f1np1        Driver: mlx5_core  RSS:     4    [Running ZC]   
Name: enp65s0f0np0         Driver: mlx5_core  RSS:     4    [Running ZC]   
Name: enp65s0f1np1         Driver: mlx5_core  RSS:     4    [Running ZC] 

n2disk output:

[root@localhost ~]# n2disk /etc/n2disk/n2disk-enp65s0f0np0.conf 
###################################################
# You do not seem to have a permanent n2disk license.
# We're now in demo mode limited to 6 day(s) 19:55:10
###################################################
28/Feb/2024 13:49:35 [n2disk.c:6739] Welcome to n2disk v.3.6.240222 (r5303) [CPU A10F1]
28/Feb/2024 13:49:35 [n2disk.c:6772] Running on 1 node(s) system with 96 core(s). NUMA affinity set to node 0.
28/Feb/2024 13:49:35 [n2disk.c:6834] Using PF_RING for packet capture
28/Feb/2024 13:49:35 [n2disk.c:6859] Multithread support enabled
28/Feb/2024 13:49:35 [n2disk.c:6967] Reading volume size for /data/n2disk/pcap4/1
28/Feb/2024 13:49:35 [n2disk.c:6977] Reading blk id for /data/n2disk/pcap4/1
28/Feb/2024 13:49:35 [n2disk.c:6967] Reading volume size for /data/n2disk/pcap4/2
28/Feb/2024 13:49:35 [n2disk.c:6977] Reading blk id for /data/n2disk/pcap4/2
28/Feb/2024 13:49:35 [n2disk.c:6967] Reading volume size for /data/n2disk/pcap4/3
28/Feb/2024 13:49:35 [n2disk.c:6977] Reading blk id for /data/n2disk/pcap4/3
28/Feb/2024 13:49:35 [n2disk.c:6967] Reading volume size for /data/n2disk/pcap4/4
28/Feb/2024 13:49:35 [n2disk.c:6977] Reading blk id for /data/n2disk/pcap4/4
28/Feb/2024 13:49:35 [n2disk.c:7012] Checking shared volumes
28/Feb/2024 13:49:35 [n2disk.c:7032] Computing storage limit
28/Feb/2024 13:49:35 [n2disk.c:7054] Storage /data/n2disk/pcap4/1 limit set to 1489.91 GB, total volume size is 7449.57 GB
28/Feb/2024 13:49:35 [n2disk.c:7054] Storage /data/n2disk/pcap4/2 limit set to 1489.91 GB, total volume size is 7449.57 GB
28/Feb/2024 13:49:35 [n2disk.c:7054] Storage /data/n2disk/pcap4/3 limit set to 1489.91 GB, total volume size is 7449.57 GB
28/Feb/2024 13:49:35 [n2disk.c:7054] Storage /data/n2disk/pcap4/4 limit set to 1489.91 GB, total volume size is 7449.57 GB
28/Feb/2024 13:49:35 [n2disk.c:7060] Disk limits set
28/Feb/2024 13:49:35 [n2disk.c:7101] Dump files max size is set to 10 GB
28/Feb/2024 13:49:35 [n2disk.c:7124] Buffer memory is set to 20 GB (x 2 pcap files)
28/Feb/2024 13:49:35 [n2disk.c:7158] Storage #0 directory: /data/n2disk/pcap4/1
28/Feb/2024 13:49:35 [n2disk.c:7158] Storage #1 directory: /data/n2disk/pcap4/2
28/Feb/2024 13:49:35 [n2disk.c:7158] Storage #2 directory: /data/n2disk/pcap4/3
28/Feb/2024 13:49:35 [n2disk.c:7158] Storage #3 directory: /data/n2disk/pcap4/4
28/Feb/2024 13:49:35 [n2disk.c:7176] Up to 100 files will be written per folder
28/Feb/2024 13:49:35 [n2disk.c:7182] Dump files max duration is set to 600 sec
28/Feb/2024 13:49:35 [n2disk.c:7214] Dumping data in 0.1 MB chunks
28/Feb/2024 13:49:35 [n2disk.c:7225] Index processing memory is set to 4 GB (x 2 x 2 index files)
28/Feb/2024 13:49:35 [n2disk.c:7228] Index len = 4444964864 [bloom=155680][digest=4444805184]
28/Feb/2024 13:49:39 [n2disk.c:7354] Memory allocated successfully
28/Feb/2024 13:49:39 [n2disk.c:4533] Using packet timestamps from pf_ring
28/Feb/2024 13:49:39 [n2disk.c:4607] Using PF_RING v.8.6.1
28/Feb/2024 13:49:39 [n2disk.c:4615] Dumping traffic statistics on /proc/net/pf_ring/stats/2331884-none.1
28/Feb/2024 13:49:39 [n2disk.c:4627] Started PF_RING packet reader thread for device mlx:mlx5_2@[0-3]
28/Feb/2024 13:49:39 [n2disk.c:7381] Starting compression thread #0...
28/Feb/2024 13:49:39 [n2disk.c:7381] Starting compression thread #1...
28/Feb/2024 13:49:39 [n2disk.c:7393] Starting dump file writer thread #0...
28/Feb/2024 13:49:39 [n2disk.c:7393] Starting dump file writer thread #1...
28/Feb/2024 13:49:39 [n2disk.c:7393] Starting dump file writer thread #2...
28/Feb/2024 13:49:39 [n2disk.c:7393] Starting dump file writer thread #3...
28/Feb/2024 13:49:39 [n2disk.c:7434] Starting pcap packet reader thread...
28/Feb/2024 13:49:39 [n2disk.c:669] Binding thread on CPU core 4 (NUMA node 0)
28/Feb/2024 13:49:39 [n2disk.c:669] Binding thread on CPU core 0 (NUMA node 0)
28/Feb/2024 13:49:39 [n2disk.c:669] Binding thread on CPU core 11 (NUMA node 0)
28/Feb/2024 13:49:39 [n2disk.c:669] Binding thread on CPU core 12 (NUMA node 0)
28/Feb/2024 13:49:39 [n2disk.c:5589] [reader] Packet capture started
28/Feb/2024 13:49:39 [n2disk.c:634] n2disk changed user to n2disk
28/Feb/2024 13:49:39 [n2disk.c:2910] Storage /data/n2disk/pcap4/1: 0.00 GB in use
28/Feb/2024 13:49:39 [n2disk.c:2910] Storage /data/n2disk/pcap4/4: 0.00 GB in use
28/Feb/2024 13:49:39 [n2disk.c:2910] Storage /data/n2disk/pcap4/3: 0.00 GB in use
28/Feb/2024 13:49:39 [n2disk.c:2910] Storage /data/n2disk/pcap4/2: 0.00 GB in use
28/Feb/2024 13:49:39 [n2disk.c:2799] Storage /data/n2disk/pcap4/4 space check: 0.00 GB in use by n2disk out of 1489.91 GB
28/Feb/2024 13:49:39 [n2disk.c:2799] Storage /data/n2disk/pcap4/1 space check: 0.00 GB in use by n2disk out of 1489.91 GB
28/Feb/2024 13:49:39 [n2disk.c:2799] Storage /data/n2disk/pcap4/3 space check: 0.00 GB in use by n2disk out of 1489.91 GB
28/Feb/2024 13:49:39 [n2disk.c:2799] Storage /data/n2disk/pcap4/2 space check: 0.00 GB in use by n2disk out of 1489.91 GB
^C28/Feb/2024 13:49:53 [n2disk.c:1340] Caught termination signal 2...
28/Feb/2024 13:49:53 [n2disk.c:455] Changing shutdown stage to 1 [shutdown_started]
28/Feb/2024 13:49:53 [n2disk.c:1176] [PF_RING] Total stats: 0 pkts rcvd/0 pkts filtered/1013773 pkts dropped [100.0%]
28/Feb/2024 13:49:55 [n2disk.c:5619] Packet capture thread terminated
28/Feb/2024 13:49:55 [n2disk.c:455] Changing shutdown stage to 2 [reader_terminated]
28/Feb/2024 13:49:55 [n2disk.c:7490] Reader thread terminated
28/Feb/2024 13:49:55 [n2disk.c:2637] 		[compressor][thread #1] Thread terminated
28/Feb/2024 13:49:55 [n2disk.c:2637] 		[compressor][thread #0] Thread terminated
28/Feb/2024 13:49:55 [n2disk.c:455] Changing shutdown stage to 3 [compressor_terminated]
28/Feb/2024 13:49:55 [n2disk.c:3291] 		[writer][#3] Thread terminated
28/Feb/2024 13:49:55 [n2disk.c:3291] 		[writer][#0] Thread terminated
28/Feb/2024 13:49:55 [n2disk.c:7510] Writer thread #0 terminated
28/Feb/2024 13:49:55 [n2disk.c:3291] 		[writer][#1] Thread terminated
28/Feb/2024 13:49:55 [n2disk.c:7510] Writer thread #1 terminated
28/Feb/2024 13:49:55 [n2disk.c:3291] 		[writer][#2] Thread terminated
28/Feb/2024 13:49:55 [n2disk.c:7510] Writer thread #2 terminated
28/Feb/2024 13:49:55 [n2disk.c:7510] Writer thread #3 terminated
28/Feb/2024 13:49:55 [n2disk.c:7520] Compression thread #0 terminated
28/Feb/2024 13:49:55 [n2disk.c:7520] Compression thread #1 terminated

@jmtobin-uh please try running "pfcount_multichannel -i mlx:mlx5_2" for a few seconds and paste here the output

Sorry, I just realized you are using the stable branch. Multithreaded capture is a dev feature, please use nightly builds.