OpenVisualCloud/Media-Transport-Library

Cannot run sample RxSt20PipelineSample, mlx5_common: No Verbs device matches PCI device

Closed this issue · 17 comments

Dear Media-library team,

I have configured my debian system to install the dpdk and media transport library.
I followed the build and run guide and can successfully create VF and bind vfio-pci to the vfs.
NIC - Mellanox Connect X6 - Dx (MLNX_OFED 23.04-1.1.3.0)
OS - Debian 11, (kernel 5.15.55)
CPU - Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz, 16 cores, no numa, hyperthreading disabled in bios
Except core 0 all other cores are isolated from linux kernel scheduler. NOTE: no vmx flag seen in lscpu command

  1. I installed the MLNX_OFED-23.04-1.1.3.0 with the following command.
    $ ./mlnxofedinstall --dpdk --upstream-libs --skip-distro-check
  2. activated the intel vt-D setting in the bios and added iommu to the kernel commandline , hugepages activated etc.
    NOTE: SR-IOV support in BIOS is disabled, Do I need this ?
  3. Then followed the steps to clone and install both media library and dpdk
  4. with the nicctl.sh script I could create VFs and bind vfio-pci to them.

root@KC200-24-FLEX-AIC:~/Media-Transport-Library# ./script/nicctl.sh create_vf 0000:c3:00.0
0000:c3:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=mlnx1 drv=mlx5_core unused= Active
Bind 0000:c3:00.2(eth0) to vfio-pci success
Bind 0000:c3:00.3(eth1) to vfio-pci success
Bind 0000:c3:00.4(eth2) to vfio-pci success
Bind 0000:c3:00.5(eth3) to vfio-pci success
Bind 0000:c3:00.6(eth4) to vfio-pci success
Bind 0000:c3:00.7(eth5) to vfio-pci success
Create VFs on PF bdf: 0000:c3:00.0 mlnx1 succ

root@KC200-24-FLEX-AIC:~/Media-Transport-Library# ./script/nicctl.sh status 0000:c3:00.0
0000:c3:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=mlnx1 drv=mlx5_core unused=vfio-pci Active

root@KC200-24-FLEX-AIC:~/Media-Transport-Library# ./script/nicctl.sh status 0000:c3:00.2
0000:c3:00.2 'ConnectX Family mlx5Gen Virtual Function 101e' drv=vfio-pci unused=mlx5_core
Bind bdf: 0000:c3:00.2 to kernel eth0 succ

root@KC200-24-FLEX-AIC:~/Media-Transport-Library# lspci | grep Mel
c3:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
c3:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
c3:00.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
c3:00.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
c3:00.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
c3:00.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
c3:00.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
c3:00.7 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

root@KC200-24-FLEX-AIC:~/Media-Transport-Library# lsmod | grep -e ib -e mlx
ib_ipoib 151552 0
ib_cm 139264 2 rdma_cm,ib_ipoib
ib_umad 40960 0
mlx5_ib 454656 0
ib_uverbs 151552 2 rdma_ucm,mlx5_ib
ib_core 458752 8 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
mlx5_core 2019328 1 mlx5_ib
mlxfw 36864 1 mlx5_core
mlxdevm 176128 1 mlx5_core
mlx_compat 24576 11 rdma_cm,ib_ipoib,mlxdevm,iw_cm,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core
ptp 32768 4 igb,mlx5_core
pci_hyperv_intf 16384 1 mlx5_core

$ dmesg
...
[ 369.223349] VFIO - User Level meta-driver version: 0.3
[ 369.293820] mlx5_core 0000:c3:00.0: E-Switch: Enable: mode(LEGACY), nvfs(6), active vports(7)
[ 369.408209] pci 0000:c3:00.2: [15b3:101e] type 00 class 0x020000
[ 369.414514] pci 0000:c3:00.2: enabling Extended Tags
[ 369.420836] pci 0000:c3:00.2: Adding to iommu group 144
[ 369.427207] mlx5_core 0000:c3:00.2: enabling device (0000 -> 0002)
[ 369.434092] mlx5_core 0000:c3:00.2: firmware version: 22.37.1014
[ 369.619426] mlx5_core 0000:c3:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 369.644794] mlx5_core 0000:c3:00.2: Assigned random MAC address d6:52:af:af:f9:4d
[ 369.652371] mlx5_core 0000:c3:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[ 369.778229] mlx5_core 0000:c3:00.2: Supported tc offload range - chains: 1, prios: 1
....

  1. But when running the RxTxApp I get the following error.
    NOTE : I have ST2110 source generating signal at mcast address 239.0.90.1 and src ip is 192.168.1.90
    root@KC200-24-FLEX-AIC:~/Media-Transport-Library# ./build/app/RxSt20PipelineSample --p_port 0000:c3:00.2 --p_sip 192.168.1.90 --p_rx_ip 239.0.90.1
    MT: dev_eal_init(0), port_param: 0000:c3:00.2
    MT: dev_eal_init, wait eal_init_thread done
    EAL: Detected CPU lcores: 16
    EAL: Detected NUMA nodes: 1
    EAL: Detected shared linkage of DPDK
    EAL: Selected IOVA mode 'VA'
    EAL: No free 1048576 kB hugepages reported on node 0
    EAL: VFIO support initialized
    EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.2 (socket -1)
    mlx5_common: No Verbs device matches PCI device 0000:c3:00.2, are kernel drivers loaded?
    mlx5_common: Verbs device not found: 0000:c3:00.2
    mlx5_common: Failed to initialize device context.
    EAL: Requested device 0000:c3:00.2 cannot be used
    EAL: Bus (pci) probe failed.
    TELEMETRY: No legacy callbacks, legacy socket not created
    MT: st version: 23.12.0 Fri Aug 4 09:02:04 2023 a578ad6 gcc-10.2.1, dpdk version: DPDK 23.03.0
    MT: Error: mt_dev_get_socket, failed to locate 0000:c3:00.2. Please run nicctl.sh
    MT: Error: mtl_init, get socket fail -19
    main: mtl_init fail

Do you know or have seen this issue before ?

Cheers
Prankur

Seems it's failed on dpdk startup. Can you run dpdk sample application(testpmd) to confirm the MLX pmd is working well?BTW, the NIC we verified is intel E810/E710 serie, the status on other NIC is not know.

Dear Mr. Du,
Thanks for your reply.
Please see the following output from the dpdk-testpmd command.

./build/app/dpdk-testpmd -l 1-4 -n 4 -- -i
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:c3:00.0 (socket -1)
EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:c3:00.1 (socket -1)
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.2 (socket -1)
mlx5_common: No Verbs device matches PCI device 0000:c3:00.2, are kernel drivers loaded?
mlx5_common: Verbs device not found: 0000:c3:00.2
mlx5_common: Failed to initialize device context.
EAL: Requested device 0000:c3:00.2 cannot be used
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.3 (socket -1)
mlx5_common: No Verbs device matches PCI device 0000:c3:00.3, are kernel drivers loaded?
mlx5_common: Verbs device not found: 0000:c3:00.3
mlx5_common: Failed to initialize device context.
EAL: Requested device 0000:c3:00.3 cannot be used
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.4 (socket -1)
mlx5_common: No Verbs device matches PCI device 0000:c3:00.4, are kernel drivers loaded?
mlx5_common: Verbs device not found: 0000:c3:00.4
mlx5_common: Failed to initialize device context.
EAL: Requested device 0000:c3:00.4 cannot be used
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.5 (socket -1)
mlx5_common: No Verbs device matches PCI device 0000:c3:00.5, are kernel drivers loaded?
mlx5_common: Verbs device not found: 0000:c3:00.5
mlx5_common: Failed to initialize device context.
EAL: Requested device 0000:c3:00.5 cannot be used
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.6 (socket -1)
mlx5_common: No Verbs device matches PCI device 0000:c3:00.6, are kernel drivers loaded?
mlx5_common: Verbs device not found: 0000:c3:00.6
mlx5_common: Failed to initialize device context.
EAL: Requested device 0000:c3:00.6 cannot be used
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.7 (socket -1)
mlx5_common: No Verbs device matches PCI device 0000:c3:00.7, are kernel drivers loaded?
mlx5_common: Verbs device not found: 0000:c3:00.7
mlx5_common: Failed to initialize device context.
EAL: Requested device 0000:c3:00.7 cannot be used
TELEMETRY: No legacy callbacks, legacy socket not created
Interactive-mode selected
Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool <mb_pool_0>: n=171456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: E8:EB:D3:6C:60:FA
Configuring Port 1 (socket 0)
Port 1: E8:EB:D3:6C:60:FB
Checking link statuses...
Done
testpmd>

For the physical interfaces 0000:c3:00.0 and .1 there was no such weird message, but the virtual functions from 0000:c3::00.2 - .7 have same error message.

Do you believe I should enable PCIe Extended Tags in the bios ?

I also looked at your youtube video regarding the "Real Time low latency media transport stack based on dpdk" and decided to try our Mellanox cards CX6-Dx to check if I can find an alternative to the rivermax sdk and kernel drivers.

Cheers
Prankur

Not sure, the most possible cause is no VF PMD support in DPDK for this NIC. The PF pmd can not be used for VF.

Also, check http://doc.dpdk.org/guides/nics/mlx5.html for more information.

Dear Mr. Du,

Thanks for your prompt reply.
I checked their documentation and it looks the dpdk pmd is supported for the ConnectX6-Dx. I will try with their tested platform hardware / Operating system combination.

I want to ask some questions regarding the dpdk user summit 2022 video, If you can please share your email to prankur.chauhan89@gmail.com then I will take it up there.

By the way a stupid question:
I am not running any VM and directly running Debian on the hardware. So there are no hypervisor child partition / parent partition redirect of calls to read/write data from PCIe NIC card.

I am sure the media transport library also works on native operating system without any virtual machine OR ?

Cheers
Prankur

Certainly, the media transport library operates efficiently on Virtual Functions (VFs) with the assistance of VFIO for bare metal setup. These VFs are created by Single Root I/O Virtualization (SR-IOV). From a user space perspective, there's no discernible difference between these VFs and the Physical Functions (PFs).

Dear Mr. Du,
I have made some progress in terms of testing dpdk-testpmd software.
The issue was that NVIDIA PMD uses the mlx5_core driver and NOT the vfio-pci driver unlike other PMDs.

“ PMDs which use the bifurcated driver co-exists with the device kernel driver. On such model the NIC is controlled by the kernel, while the data path is performed by the PMD directly on top of the device. “

Unfortunately the media transport library test application has now some other issue ( this issue is also seen on the PFs //physical function or interface)

MT: Error: parse_driver_info, unknown nic driver mlx5_pci

$ ./build/app/RxSt20PipelineSample --p_port 0000:c3:00.3 --p_sip 192.168.1.90 --p_rx_ip 239.0.90.1
MT: dev_eal_init(0), port_param: 0000:c3:00.3
MT: dev_eal_init, wait eal_init_thread done
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Selected IOVA mode 'VA'
EAL: No free 1048576 kB hugepages reported on node 0
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.3 (socket -1)
TELEMETRY: No legacy callbacks, legacy socket not created
MT: st version: 23.12.0 Wed Aug 9 12:31:50 2023 5ddeda7 gcc-10.2.1, dpdk version: DPDK 23.03.0
MT: mt_dev_get_socket, direct soc_id from SOCKET_ID_ANY to 0 for 0000:c3:00.3
MT: mtl_init(0), socket_id 0
MT: Error: parse_driver_info, unknown nic driver mlx5_pci
MT: Error: mt_dev_if_init, parse_driver_info fail(-5) for 0000:c3:00.3
MT: dev_close_port(0), port not started
MT: Error: mtl_init, st dev if init fail -5
MT: Warn: mt_stat_unregister, cb 0x7f2faaf2c530 priv 0x118082abc0 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee1360 priv 0x1180832378 not found
MT: mt_cni_uinit, succ
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180832728 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180838b10 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118083eef8 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x11808452e0 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118084b6c8 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180851ab0 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180857e98 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118085e280 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180864668 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118086aa50 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180870e38 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180877220 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118087d608 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x11808839f0 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180889dd8 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x11808901c0 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x11808965a8 not found
MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118089c990 not found
MT: Error: dev_uinit_lcores, no lcore shm attached
MT: Warn: mt_stat_unregister, cb 0x7f2faaedd7b0 priv 0x118082abc0 not found
MT: dev_stop_port(0), port not started
MT: mt_dev_free, succ
MT: mt_main_free, succ
MT: dev_close_port(0), port not started
MT: mt_dev_uinit, succ
MT: mtl_uninit, succ
main: mtl_init fail

===================================================
from the dpdk-testpmd the interface seems to work

~/dpdk# ./build/app/dpdk-testpmd -l 1-5 -n 4 -a 0000:c3:00.3 -- -i
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.3 (socket -1)
TELEMETRY: No legacy callbacks, legacy socket not created
Interactive-mode selected
Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool <mb_pool_0>: n=179456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc

Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.

Configuring Port 0 (socket 0)
Port 0: F2:E2:55:9E:30:D8
Checking link statuses...
Done
testpmd> start
io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP allocation mode: native
Logical Core 2 (socket 0) forwards packets on 1 streams:
RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=1
port 0: RX queue number: 1 Tx queue number: 1
Rx offloads=0x0 Tx offloads=0x10000
RX queue: 0
RX desc=256 - RX free threshold=64
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x0
TX queue: 0
TX desc=256 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x10000 - TX RS bit threshold=0
testpmd> stop
Telling cores to stop...
Waiting for lcores to finish...

---------------------- Forward statistics for port 0 ----------------------
RX-packets: 4 RX-dropped: 0 RX-total: 4
TX-packets: 4 TX-dropped: 0 TX-total: 4

+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
RX-packets: 4 RX-dropped: 0 RX-total: 4
TX-packets: 4 TX-dropped: 0 TX-total: 4
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done.
testpmd> quit

Stopping port 0...
Stopping ports...
Done

Shutting down port 0...
Closing ports...
Port 0 is closed
Done

Bye...

Question: have you encountered this issue before ?

Cheers
Prankur

Hi Prankur,

You have to add this new dev into the supported list, https://github.com/OpenVisualCloud/Media-Transport-Library/blob/main/lib/src/mt_dev.c#L29

Something like below

    {
        .name = "mlx5_pci",
        .port_type = MT_PORT_PF,
        .drv_type = MT_DRV_MLX5, /* add a new enum in the header file */
        .flow_type = MT_FLOW_ALL,
    },

Please note rate limit feature is only available on Intel E810, for other device, the TSC pacing is used.

Dear Mr. Du,

Thanks for the patch, I added the support for mlx5_pci driver as you suggested.
The RxSt20PipelineSample works and I can receive 1 x ST2110-20 stream from my source.
I also verified the pcie bandwidth with intel pcm tool (pcm-iio)
I did not test extensively for the performance but atleast it is a starting point.

By the way I saw the pipeline sample did not use the isolated cores , is there some commandline option like in dpdk (-l x-y) where I can specify which cores to use to run the rte-workers ?

I do not quite understand what you mean by rate limit ?
Also by TSC pacing, you mean the TSC timer is used for tx traffic shaping (ST2110-21) and not the HPET timer ?

Cheers
Prankur

Hi Prankur,

Great to here it can work, can you help to create a PR to upstream the patch?

Yes, TSC time source is used for pacing shaping if no rate limit available, in this case, we use TSC time to decide when put the packet into the NIC queue, but please note TSC pacing can't fully complaint to narrow gapping since the actual time transmitted is not controllable.

Ratelimit is a hardware feature for E810, we use this function with some software creative to achieve the strict narrow gapping mode.

For isolated cores, the lib support this already by lcores in struct mtl_init_params, the pipeline sample not add this customization argument, but it can be easily added. The RxTxApp has --locres support already, see https://github.com/OpenVisualCloud/Media-Transport-Library/blob/main/app/src/args.c#L501.

Dear Mr. Du,

The patch is mlx5.txt

Please check the test results.

$ ~/Media-Transport-Library# ./script/nicctl.sh create_vf 0000:c3:00.0 2
0000:c3:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=mlnx1 drv=mlx5_core unused=vfio-pci Active
PMD uses bifurcated driver, No need to bind the 0000:c3:00.2(eth0) to vfio-pci
PMD uses bifurcated driver, No need to bind the 0000:c3:00.3(eth1) to vfio-pci
Create VFs on PF bdf: 0000:c3:00.0 mlnx1 succ

$ ~/Media-Transport-Library# ./build/app/RxSt20PipelineSample --p_port 0000:c3:00.3 --p_sip 192.168.1.90 --p_rx_ip 239.0.90.1
MT: dev_eal_init(0), port_param: 0000:c3:00.3
MT: dev_eal_init, wait eal_init_thread done
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Selected IOVA mode 'VA'
EAL: No free 1048576 kB hugepages reported on node 0
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.3 (socket -1)
TELEMETRY: No legacy callbacks, legacy socket not created
MT: st version: 23.12.0 Thu Aug 10 07:47:10 2023 b926213-dirty gcc-10.2.1, dpdk version: DPDK 23.03.0
MT: mt_dev_get_socket, direct soc_id from SOCKET_ID_ANY to 0 for 0000:c3:00.3
MT: mtl_init(0), socket_id 0
MT: mt_dev_if_init(0), use mt ptp source
MT: mt_dev_if_init(0), user request queues tx 0 rx 1, deprecated sessions tx 0 rx 0
MT: Warn: dev_config_port(0), failed to setup all ptype, only 0 supported
MT: dev_config_port(0), tx_q(1 with 512 desc) rx_q (2 with 2048 desc)
MT: mt_mempool_create_by_ops(0), succ at 0x1180afde40 size 2.156250m n 1024 d 2048 for T_P0_SYS
MT: mt_mempool_create_by_ops(0), succ at 0x1180e8d2c0 size 6.468750m n 3072 d 2048 for R_P0Q0_MBUF
MT: mt_mempool_create_by_ops(0), succ at 0x1181afdf00 size 4.968750m n 3072 d 1536 for R_P0Q1_MBUF
MT: mt_dev_if_init(0), port_id 0 port_type 2 drv_type 8
MT: mt_dev_if_init(0), dev_capa 0x14, offload 0x196af:0x18621f queue offload 0x0:0x18601f, rss : 0xf00000000803afbc
MT: mt_dev_if_init(0), system_rx_queues_end 1 hdr_split_rx_queues_end 1
MT: mt_dev_if_init(0), sip: 192.168.1.90
MT: mt_dev_if_init(0), netmask: 255.255.255.0
MT: mt_dev_if_init(0), gateway: 0.0.0.0
MT: mt_dev_if_init(0), mac: ce:31:82:3d:90:d0
MT: dev_init_lcores, shared memory attached at 0x7f6c52fe0000 nattch 1
MT: dev_start_port(0), rx_defer 0
MT: mt_eth_link_dump(0), link_speed 100g link_status 1 link_duplex 1 link_autoneg 1
MT: Error: dev_rl_shaper_add(0), shaper add error: (-38)Function not implemented
MT: Error: dev_tx_queue_set_rl_rate(0), rl shaper get fail for q 0
MT: Warn: dev_if_init_pacing(0), fallback to tsc as rl init fail
MT: mt_dev_create(0), feature 0x70, tx pacing tsc
MT: mt_sch_mrg_init, succ with data quota 31068 M, nb_tasklets 16
MT: mt_sch_add_quota(0:0), quota 0 total now 0
MT: dev_stat_thread, start
MT: mt_dev_create, succ, stat period 10s
MT: mt_dev_get_tx_queue(0), q 0 without rl
MT: mt_mcast_init, report every 10 seconds
MT: mt_dev_get_rx_queue(0), q 0 ip 0.0.0.0 port 0
MT: cni_queues_init(0), rxq 0
MT: mt_sch_register_tasklet(0), tasklet cni registered into slot 0
MT: cni_traffic_thread, start
MT: st_plugins_init, succ
MT: admin_thread, start
MT: config_parse_json, parse kahawai.json with json-c version: 0.15
MT: st22_decoder_register(0), st22_decoder_sample registered, device 1 cap(0x300000000000000:0x70000002b)
MT: st22_encoder_register(0), st22_encoder_sample registered, device 1 cap(0x70000002b:0x300000000000000)
st_plugin_create, succ with st22 sample plugin
MT: st_plugin_register(0), /usr/local/lib/x86_64-linux-gnu/libst_plugin_st22_sample.so registered, version 1
MT: Warn: st_plugin_register, dlopen /usr/local/lib64/libst_plugin_st22_sample.so fail
MT: mt_main_create, succ
MT: mtl_init, succ, tsc_hz 2400000000
MT: mtl_init, simd level avx512_vbmi, flags 0x1
MT: rx_st20p_init_dst_fbs(0), size 5184000 fmt 5 with 3 frames
MT: mt_sch_add_quota(0:0), quota 2589 total now 2589
MT: mt_sch_get(0), succ with quota_mbs 2589
MT: mt_sch_register_tasklet(0), tasklet rvs_pkt_rx registered into slot 1
MT: mt_sch_register_tasklet(0), tasklet rvs_ctl registered into slot 2
MT: rvs_mgr_init(0), succ
MT: dev_rx_queue_create_flow(0), queue 1 succ, ip 239.0.90.1 port 20000
MT: mt_dev_get_rx_queue(0), q 1 ip 239.0.90.1 port 20000
MT: rv_init_hw(0), port(l:0,p:0), queue 1 udp 20000
MT: mt_mcast_join(0), new group 239.0.90.1
MT: rv_attach(0), 3 frames with size 5184000(810,0), type 0, progressive
MT: rv_attach(0), w 1920 h 1080 fmt ST20_FMT_YUV_422_10BIT packing 0 pt 112 flags 0x0 frame time 16.683333ms
MT: mt_sch_add_quota(0:0), quota 1294 total now 3883
MT: st20_rx_create_with_mask, succ on sch 0 session 0
MT: st20p_rx_create(0), transport fmt ST20_FMT_YUV_422_10BIT, output fmt YUV422RFC4175PG2BE10
rx_st20p_frame_thread(0), start
MT: mt_calibrate_tsc, tscHz 2400009156
MT: mt_dev_get_lcore, available lcore 7
MT: sch_tasklet_func(0), start with 3 tasklets
MT: sch_start(0), succ on lcore 7
MT: mt_dev_start, succ
MT: _mt_start, succ, avail ports 1
MT: cni_traffic_thread, stop
MT: rvs_ctl_tasklet_start(0), succ
MT: * * M T D E V S T A T E * *
MT: DEV(0): Avr rate, tx: 0.000040 Mb/s, rx: 2350.522345 Mb/s, pkts, tx: 1, rx: 2220362
MT: Error: DEV(0): Status: imissed 239762 ierrors 0 oerrors 0 rx_nombuf 0
MT: Error: rx_good_packets: 559
MT: Error: rx_good_bytes: 739838
MT: Error: rx_q1_packets: 559
MT: Error: rx_q1_bytes: 739838
MT: Error: rx_multicast_packets: 2460541
MT: Error: rx_multicast_bytes: 3265818645
MT: Error: tx_multicast_packets: 8
MT: Error: tx_multicast_bytes: 763
MT: Error: rx_out_of_buffer: 239762
MT: CNI(0): eth_rx_rate 0 Mb/s, eth_rx_cnt 7
MT: PTP(0): time 1691656569194256528, 2023-08-10 08:36:09
MT: RX_VIDEO_SESSION(0,0:st20p_test): fps 0.000000 frames 0 pkts 0
MT: RX_VIDEO_SESSION(0,0:st20p_test): throughput 0 Mb/s, cpu busy 4.796298
MT: RX_VIDEO_SESSION(0,0): wrong hdr dropped pkts 2221485
MT: * * E N D S T A T E * *

^Csample_sig_handler, signal 2
rx_st20p_frame_thread(0), stop
main(0), received frames 0
MT: sch_tasklet_func(0), end with 3 tasklets
MT: cni_traffic_thread, start
MT: mt_dev_put_lcore, lcore 7
MT: sch_stop(0), succ
MT: mt_sch_stop_all, succ
MT: _mt_stop, succ
main(0), error, no received frames 0
MT: RX_VIDEO_SESSION(0,0:st20p_test): fps 0.000000 frames 0 pkts 0
MT: RX_VIDEO_SESSION(0,0:st20p_test): throughput 0 Mb/s, cpu busy 4.796298
MT: RX_VIDEO_SESSION(0,0): wrong hdr dropped pkts 327595
MT: mt_mcast_leave(0), group 239.0.90.1 ref cnt 0
MT: mt_dev_put_rx_queue(0), q 1
MT: sch_free_quota(0), quota 3883 total now 0
MT: st20_rx_free, succ on sch 0 session 0
MT: st22_decoder_unregister(0), unregister st22_decoder_sample
MT: st22_encoder_unregister(0), unregister st22_encoder_sample
st_plugin_free, succ with st22 sample plugin
MT: admin_thread, stop
MT: mt_sch_unregister_tasklet(0), tasklet cni(0) unregistered
MT: cni_traffic_thread, stop
MT: mt_dev_put_rx_queue(0), q 0
MT: mt_cni_uinit, succ
MT: sch_free_quota(0), quota 0 total now 0
MT: mt_sch_put(0), ref_cnt now zero
MT: Warn: sch_stop(0), not started
MT: mt_sch_unregister_tasklet(0), tasklet rvs_ctl(2) unregistered
MT: mt_sch_unregister_tasklet(0), tasklet rvs_pkt_rx(1) unregistered
MT: rvs_mgr_uinit(0), succ
MT: mt_dev_put_tx_queue(0), q 0
MT: dev_stat_thread, stop
MT: dev_stop_port(0), succ
MT: mt_dev_free, succ
MT: mt_main_free, succ
MT: mt_mempool_free, free mempool R_P0Q0_MBUF
MT: mt_mempool_free, free mempool R_P0Q1_MBUF
MT: mt_mempool_free, free mempool T_P0_SYS
MT: dev_close_port(0), succ
MT: mt_dev_uinit, succ
MT: mtl_uninit, succ

============ IP info ===============

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether a8:a1:59:c2:80:ba brd ff:ff:ff:ff:ff:ff
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether a8:a1:59:c2:80:bb brd ff:ff:ff:ff:ff:ff
inet 192.168.30.24/24 brd 192.168.30.255 scope global eno1
valid_lft forever preferred_lft forever
inet6 fe80::aaa1:59ff:fec2:80bb/64 scope link
valid_lft forever preferred_lft forever
4: mlnx1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether e8:eb:d3:6c:60:fa brd ff:ff:ff:ff:ff:ff
inet 192.168.1.24/24 brd 192.168.1.255 scope global mlnx1
valid_lft forever preferred_lft forever
inet6 fe80::eaeb:d3ff:fe6c:60fa/64 scope link
valid_lft forever preferred_lft forever
5: mlnx2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether e8:eb:d3:6c:60:fb brd ff:ff:ff:ff:ff:ff
inet 192.168.2.24/24 brd 192.168.2.255 scope global mlnx2
valid_lft forever preferred_lft forever
inet6 fe80::eaeb:d3ff:fe6c:60fb/64 scope link
valid_lft forever preferred_lft forever
6: usb0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 9e:b7:9f:38:54:e8 brd ff:ff:ff:ff:ff:ff
14: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 6e:cf:9f:bf:c8:49 brd ff:ff:ff:ff:ff:ff
15: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether ce:31:82:3d:90:d0 brd ff:ff:ff:ff:ff:ff
inet6 fe80::cc31:82ff:fe3d:90d0/64 scope link
valid_lft forever preferred_lft forever

Cheers

I see many error in the log.
MT: RX_VIDEO_SESSION(0,0): wrong hdr dropped pkts 2221485
It's caused the payload type(default) not set correctly, you can customize it by --payload_type xxx.

And the patch looks good to me, can you create a PR that we can processed the upstream process?

Dear Mr. Du,

Thanks for approving the pull-request.

I chose the payload type to be 112 (video) but I still see errors
For information I tried to configure the source to BPM and GPM both does show wrong header format.
even though data is transferred over PCIe bus as seen from the pcm-iio tool

Please see the logs
media transport
media_log.txt

pcm -iio
media_pcm_iio.txt

Can you add a log on below line to print the payload type it expected and the TX is actually sending.

s->stat_pkts_wrong_hdr_dropped++;

This is the only place which cause this error for frame mode.

And you can use this API mt_mbuf_dump_hdr to dump the received mbuf and check if there's any mismatch for the RTP header.

Dear Mr. Du,
Please check my comments

Can you add a log on below line to print the payload type it expected and the TX is actually sending.
[Prankur] You mean the RX is receiving ?

Indeed when I set the --payload_type to 96 then I can receive the video without any wrong header format errors.
There are still some error messages like
...
MT: Error: dev_rl_shaper_add(0), shaper add error: (-38)Function not implemented
...
MT: Error: DEV(0): Status: imissed 239964 ierrors 0 oerrors 0 rx_nombuf 0
MT: Error: rx_good_packets: 593
MT: Error: rx_good_bytes: 784834
MT: Error: rx_q1_packets: 593
MT: Error: rx_q1_bytes: 784834
MT: Error: rx_multicast_packets: 2460738
MT: Error: rx_multicast_bytes: 3266080246
MT: Error: tx_multicast_packets: 2
MT: Error: tx_multicast_bytes: 120
MT: Error: rx_out_of_buffer: 239964
...

which don't make much sense to me. Should I just ignore them ?

Please check the complete logs here :
media_log.txt

Cheers
Prankur

Yes, ignore them. dev_rl_shaper_add is for rate limit function detect, the NIC without this feature will get this error. imissed print is only happen on the start, system is busy on the initial routine and no time to retrieve the packet from NIC.

Dear Mr. Du,

Thanks for your comments. I will be closing this issue.
Thanks for supporting us for the ConnectX6-Dx. Your help is greatly appreciated.

Cheers
Prankur