[BUG]: `DocaSourceStage` missing and duplicate packets

Question

[BUG]: `DocaSourceStage` missing and duplicate packets

Closed this issue 3 months ago · 0 comments

dagardner-nv commented 3 months ago

Version

24.10

Which installation method(s) does this occur on?

Source

Describe the bug.

When multiple packets are received in quick succession, the packets emitted by the DocaSourceStage stage include duplicate and missing packets. For every N packets received, the stage always emits N packets, but some are duplicated.

Minimum reproducible example

Repro steps include scripts from https://github.com/dagardner-nv/Morpheus/tree/david-dup-packets-repro

On the receive machine run:

python examples/doca/vdb_realtime/test_pipe_min.py --nic_addr=<niaddr> --gpu_addr=<gpu_addr> --net_type=tcp --log_level=DEBUG --out_file=tcp_sender.csv --convert

On the sending machine run:

sudo MORPHEUS_ROOT=$(pwd) python3 examples/doca/vdb_realtime/sender/send.py --dst_ip="<ip of receiv machine>" --net_type=tcp

Relevant log output

Click here to see error details

,src_ip,data
0,192.168.2.28,"3: v It is presumed that CUDA Toolkit and NVIDIA driver are installed on the system (host x86 or DPU Arm) where the DOCA GPUNetIO is built and executed. Internal hardware topology of the system should be GPUDirect-RDMA-friendly to maximize the internal throughput between the GPU and the NIC. As DOCA GPUNetIO is present in both DOCA for host and DOCA BFB (for DPU Arm), a GPUNetIO application can be executed either on the host CPU or on the Arm cores of the DPU. The following subsections provide a description of both scenarios. Note DOCA GPUNetIO has been tested on bare-metal and in docker but never in a virtualized environment. Using KVM is discouraged for now. Application on Host CPU Assuming the DOCA GPUNetIO application is running on the host x86 CPU cores, it is highly recommended to have a dedicated PCIe connection between the GPU and the NIC. This topology can be realized in two ways: Adding an additional PCIe switch to one of the PCIe root complex slots and attaching to this switch a GPU and a ConnectX "
1,192.168.2.28,"5: o CPU mode after flashing the right BFB image (refer to NVIDIA DOCA Installation Guide for Linux for details). From the x86 host, configure the DPU as detailed in the following steps: PCIe Configuration On some x86 systems, the Access Control Services (ACS) must be disabled to ensure direct communication between the NIC and GPU, whether they reside on the same converged accelerator DPU or on different PCIe slots in the system. The recommended solution is to disable ACS control via BIOS (e.g., Supermicro or HPE). Alternatively, it is also possible to disable it via command line, but it may not be as effective as the BIOS option. Assuming system topology Option 2, with a converged accelerator DPU as follows: $ lspci -tvvv...+-[0000:b0]-+-00.0 Intel Corporation Device 09a2 | +-00.1 Intel Corporation Device 09a4 | +-00.2 Intel Corporation Device 09a3 | +-00.4 Intel Corporation Device 0998 | -02.0-[b1-b6]----00.0-[b2-b6]--+-00.0-[b3]--+-00.0 Mellanox Technologi"
2,192.168.2.28,"27: mory buffers without using the CUDA memory API. Graphic depicting NVIDIA DOCA GPUNetIO configuration requiring a GPU and CUDA drivers and libraries installed on the same platform. Figure 3. NVIDIA DOCA GPUNetIO is a new DOCA library requiring a GPU and CUDA drivers and libraries installed on the same platform As shown in Figure 4, the typical DOCA GPUNetIO application steps are: Initial configuration phase on CPU Use DOCA to identify and initialize a GPU device and a network device Use DOCA GPUNetIO to create receive or send queues manageable from a CUDA kernel Use DOCA Flow to determine which type of packet should land in each receive queue (for example, subset of IP addresses, TCP or UDP protocol, and so on) Launch one or more CUDA kernels (to execute packet processing/filtering/analysis) Runtime control and data path on GPU within CUDA kernel Use DOCA GPUNetIO CUDA device functions to send or receive packets Use DOCA GPUNetIO CUDA device functions "
3,192.168.2.28,"3: v It is presumed that CUDA Toolkit and NVIDIA driver are installed on the system (host x86 or DPU Arm) where the DOCA GPUNetIO is built and executed. Internal hardware topology of the system should be GPUDirect-RDMA-friendly to maximize the internal throughput between the GPU and the NIC. As DOCA GPUNetIO is present in both DOCA for host and DOCA BFB (for DPU Arm), a GPUNetIO application can be executed either on the host CPU or on the Arm cores of the DPU. The following subsections provide a description of both scenarios. Note DOCA GPUNetIO has been tested on bare-metal and in docker but never in a virtualized environment. Using KVM is discouraged for now. Application on Host CPU Assuming the DOCA GPUNetIO application is running on the host x86 CPU cores, it is highly recommended to have a dedicated PCIe connection between the GPU and the NIC. This topology can be realized in two ways: Adding an additional PCIe switch to one of the PCIe root complex slots and attaching to this switch a GPU and a ConnectX "
4,192.168.2.28,"4: adapter Connecting an NVIDIA Converged Accelerator DPU to the PCIe root complex and setting it to NIC mode (i.e., exposing the GPU and NIC devices to the host) You may check the topology of your system using lspci -tvvv or nvidia-smi topo -m. Option 1: ConnectX Adapter in Ethernet Mode NVIDIA ConnectX firmware must be 22.36.1010 or later. It is highly recommended to only use NVIDIA adapter from ConnectX-6 Dx and later. DOCA GPUNetIO allows a CUDA kernel to control the NIC when working with Ethernet protocol. For this reason, the ConnectX must be set to Ethernet mode Option 2: DPU Converged Accelerator in NIC mode DPU firmware must be 24.35.2000 or newer. To expose and use the GPU and the NIC on the converged accelerator DPU to an application running on the Host x86, configure the DPU to operate in NIC mode. Application on DPU Converged Arm CPU In this scenario, the DOCA GPUNetIO is running on the CPU Arm cores of the DPU using the GPU and NIC on the same DPU . The converged accelerator DPU must be set t"
5,192.168.2.28,"5: o CPU mode after flashing the right BFB image (refer to NVIDIA DOCA Installation Guide for Linux for details). From the x86 host, configure the DPU as detailed in the following steps: PCIe Configuration On some x86 systems, the Access Control Services (ACS) must be disabled to ensure direct communication between the NIC and GPU, whether they reside on the same converged accelerator DPU or on different PCIe slots in the system. The recommended solution is to disable ACS control via BIOS (e.g., Supermicro or HPE). Alternatively, it is also possible to disable it via command line, but it may not be as effective as the BIOS option. Assuming system topology Option 2, with a converged accelerator DPU as follows: $ lspci -tvvv...+-[0000:b0]-+-00.0 Intel Corporation Device 09a2 | +-00.1 Intel Corporation Device 09a4 | +-00.2 Intel Corporation Device 09a3 | +-00.4 Intel Corporation Device 0998 | -02.0-[b1-b6]----00.0-[b2-b6]--+-00.0-[b3]--+-00.0 Mellanox Technologi"
6,192.168.2.28,"8: 4 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 4194304 kB GPU Configuration CUDA Toolkit 12.1 or newer must be installed on the host. It is also recommended to enable persistence mode to decrease initial application latency nvidia-smi -pm 1. To allow the CPU to access the GPU memory directly without the need for CUDA API, DPDK and DOCA require the GDRCopy kernel module to be installed on the system: # Run nvidia-peermem kernel module sudo modprobe nvidia-peermem # Install GDRCopy sudo apt install -y check kmod git clone https://github.com/NVIDIA/gdrcopy.git /opt/mellanox/gdrcopy cd /opt/mellanox/gdrcopy make # Run gdrdrv kernel module ./insmod.sh # Double check nvidia-peermem and gdrdrv module are running $ lsmod | egrep gdrdrv gdrdrv 24576 0 nvidia 55726080 4 nvidia_uvm,nvidia_peermem,gdrdrv,nvidia_modeset # Export library path export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/mellanox/gdrcopy/src # Ensure CUD"
7,192.168.2.28,"9: A library path is in the env var export PATH=""/usr/local/cuda/bin:${PATH}"" export LD_LIBRARY_PATH=""/usr/local/cuda/lib:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}"" export CPATH=""$(echo /usr/local/cuda/targets/{x86_64,sbsa}-linux/include | sed 's/ /:/'):${CPATH}"" BlueField-3 Specific Configuration To run a DOCA GPUNetIO application on the Arm DPU cores in a BlueField-3 converged card (section ""Application on DPU Converged Arm CPU""), it is mandatory to set an NVIDIA driver option at the end of the driver configuration file: Set NVIDIA driver option cat <<EOF | sudo tee /etc/modprobe.d/nvidia.conf options nvidia NVreg_RegistryDwords=""RmDmaAdjustPeerMmioBF3=1;"" EOF To make sure the option has been detected by the NVIDIA driver, run: Check NVIDIA driver option $ grep RegistryDwords /proc/driver/nvidia/params RegistryDwords: ""RmDmaAdjustPeerMmioBF3=1;"" RegistryDwordsPerDevice: """" GPU Memory Mapping (nvidia-peermem vs. dmabuf) To allow the NIC to send and receive packets using GPU memory, it is required to launc"
8,192.168.2.28,"10: h the NVIDIA kernel module nvidia-peermem (using modprobe nvidia-peermem). It is shipped by default with the CUDA Toolkit installation. Mapping buffers through the nvidia-peermem module is the legacy mapping mode. Alternatively, DOCA offers the ability to map GPU memory through the dmabuf providing a set high-level functions. Prerequisites are DOCA installed on a system with: Linux Kernel ≥ 6.2 libibverbs ≥ 1.14.44 CUDA Toolkit installed with the -m=kernel-open flag (which implies the NVIDIA driver in Open Source mode) Installing DOCA on kernel 6.2 to enable the dmabuf is experimental. An example can be found in the DOCA GPU Packet Processing application: DMABuf functions /* Get from CUDA the dmabuf file-descriptor for the GPU memory buffer / result = doca_gpu_dmabuf_fd(gpu_dev, gpu_buffer_addr, gpu_buffer_size, &(dmabuf_fd)); if (result != DOCA_SUCCESS) { / If it fails, create a DOCA mmap for the GPU memory buffer with the nvidia-peermem legacy method / doca_mmap_set_memrange(gpu_buffer_m"
9,192.168.2.28,"13: enabled. For further information, refer to this NVIDIA forum post. Architecture A GPU packet processing network application can be split into two fundamental phases: Setup on the CPU (devices configuration, memory allocation, launch of CUDA kernels, etc.) Main data path where GPU and NIC interact to exercise their functions DOCA GPUNetIO provides different building blocks, some of them in combination with the DOCA Ethernet library, to create a full pipeline running entirely on the GPU. During the setup phase on the CPU, applications must: Prepare all the objects on the CPU. Export a GPU handler for them. Launch a CUDA kernel passing the object's GPU handler to work with the object during the data path. For this reason, DOCA GPUNetIO is composed of two libraries: libdoca_gpunetio with functions invoked by CPU to prepare the GPU, allocate memory and objects libdoca_gpunetio_device with functions invoked by GPU within CUDA kernels during the data path The pkgconfig file for the DOCA G"
10,192.168.2.28,"25: the NVIDIA hardware can expose on host systems and DPU. NVIDIA DOCA GPUNetIO is a new library developed on top of the NVIDIA DOCA 1.5 release to introduce the notion of a GPU device in the DOCA ecosystem (Figure 3). To facilitate the creation of a DOCA GPU-centric real-time packet processing application, DOCA GPUNetIO combines GPUDirect RDMA for data-path acceleration, smart GPU memory management, low-latency message passing techniques between CPU and GPU (through GDRCopy features) and GDAKIN communications. This enables a CUDA kernel to directly control an NVIDIA ConnectX network card. To maximize the performance, DOCA GPUNetIO Library must be used on platforms considered GPUDirect-friendly, where the GPU and the network card are directly connected through a dedicated PCIe bridge. The DPU converged card is an example but the same topology can be realized on host systems as well. DOCA GPUNetIO targets are GPU packet processing network applications using the Ethernet protocol to exchange packets in a netw"
11,192.168.2.28,"6: es MT42822 BlueField-2 integrated ConnectX-6 Dx network controller | | +-00.1 Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller | | -00.2 Mellanox Technologies MT42822 BlueField-2 SoC Management Interface | -01.0-[b4-b6]----00.0-[b5-b6]----08.0-[b6]----00.0 NVIDIA Corporation Device 20b8 The PCIe switch address to consider is b2:00.0 (entry point of the DPU). ACSCtl must have all negative values: PCIe set: setpci -s b2:00.0 ECAP_ACS+6.w=0:fc To verify that the setting has been applied correctly: PCIe check $ sudo lspci -s b2:00.0 -vvvv | grep -i ACSCtl ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- If the application still does not report any received packets, try to disable IOMMU. On some systems, it can be done from the BIOS looking for the the VT-d or IOMMU from t"
12,192.168.2.28,"7: he NorthBridge configuration and change that setting to Disable and save it. The system may also require adding intel_iommu=off or amd_iommu=off to the kernel options. That can be done through the grub command line as follows: IOMMU $ sudo vim /etc/default/grub # GRUB_CMDLINE_LINUX_DEFAULT=""iommu=off intel_iommu=off "" $ sudo update-grub $ sudo reboot Hugepages A DOCA GPUNetIO application over Ethernet uses typically DOCA Flow to set flow steering rules to the Ethernet receive queues. Flow-based programs require an allocation of huge pages and it can be done temporarily as explained in the DOCA Flow or permanently via grub command line: IOMMU $ sudo vim /etc/default/grub # GRUB_CMDLINE_LINUX_DEFAULT=""default_hugepagesz=1G hugepagesz=1G hugepages=4 "" $ sudo update-grub $ sudo reboot # After rebooting, check huge pages info $ grep -i huge /proc/meminfo AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 4 HugePages_Free: "
13,192.168.2.28,"11: map, gpu_buffer_addr, gpu_buffer_size); } else { / If it succeeds, create a DOCA mmap for the GPU memory buffer using the dmabuf method */ doca_mmap_set_dmabuf_memrange(gpu_buffer_mmap, dmabuf_fd, gpu_buffer_addr, 0, gpu_buffer_size); } If the function doca_gpu_dmabuf_fd fails, it probably means the NVIDIA driver is not installed with the open-source mode. Later, when calling the doca_mmap_start, the DOCA library tries to map the GPU memory buffer using the dmabuf file descriptor. If it fails (something incorrectly set on the Linux system), it fallbacks trying to map the GPU buffer with the legacy mode (nvidia-peermem ). If it fails, an informative error is returned. GPU BAR1 Size Every time a GPU buffer is mapped to the NIC (e.g., buffers associated with send or receive queues), a portion of the GPU BAR1 mapping space is used. Therefore, it is important to check that the BAR1 mapping is large enough to hold all the bytes the DOCA GPUNetIO application is trying to map. To verify the BAR1 mapping space o"
14,192.168.2.28,"12: f a GPU you can use nvidia-smi: $ nvidia-smi -q ==============NVSMI LOG============== ..... Attached GPUs : 1 GPU 00000000:CA:00.0 Product Name : NVIDIA A100 80GB PCIe Product Architecture : Ampere Persistence Mode : Enabled ..... BAR1 Memory Usage Total : 131072 MiB Used : 1 MiB Free : 131071 MiB By default, some GPUs (e.g. RTX models) may have a very small BAR1 size: BAR1 mapping $ nvidia-smi -q | grep -i bar -A 3 BAR1 Memory Usage Total : 256 MiB Used : 6 MiB Free : 250 MiB If the BAR1 size is not enough, DOCA GPUNetIO applications may exit with errors because DOCA mmap fails to map the GPU memory buffers to the NIC (e.g., Failed to start mmap DOCA Driver call failure). To overcome this issue, the GPU BAR1 must be increased from the BIOS. The system should have ""Resizable BAR"" option"
15,192.168.2.28,"14: PUNetIO shared library is doca-gpu.pc. However, there is no pkgconfig file for the DOCA GPUNetIO CUDA device's static library /opt/mellanox/doca/lib/x86_64-linux-gnu/libdoca_gpunetio_device.a, so it must be explicitly linked to the CUDA application if DOCA GPUNetIO CUDA device functions are required. Refer to the NVIDIA DOCA GPU Packet Processing Application Guide for an example of using DOCA GPUNetIO to send and receive Ethernet packets. "
16,192.168.2.28,"15: This is an overview of the structure of NVIDIA DOCA documentation. It walks you through DOCA's developer zone portal which contains all the information about the DOCA toolkit from NVIDIA, providing everything you need to develop BlueField-accelerated applications. The NVIDIA DOCA SDK enables developers to rapidly create applications and services on top of NVIDIA® BlueField® networking platform, leveraging industry-standard APIs. With DOCA, developers can deliver breakthrough networking, security, and storage performance by harnessing the power of NVIDIA's BlueField data-processing units (DPUs) and SuperNICs. Installation DOCA contains a runtime and development environment for both the host and as part of a BlueField device image. The full installation instructions for both can be found in the NVIDIA DOCA Installation Guide for Linux. Whether DOCA has been installed on the host or on the BlueField networking platform, one can find the different DOCA components under the /opt/mellanox/doca directory. These i"
17,192.168.2.28,"16: nclude the traditional SDK-related components (libraries, header files, etc.) as well as the DOCA samples, applications, tools and more, as described in this document. API The DOCA SDK is built around the different DOCA libraries designed to leverage the capabilities of BlueField. Under the Programming Guides section, one can find a detailed description of each DOCA library, its goals, and API. These guides document DOCA's API, aiming to help developers wishing to develop DOCA-based programs. The API References section holds the Doxygen-generated documentation of DOCA's official API. See NVIDIA DOCA Library APIs. Please note that, as explained in the NVIDIA DOCA gRPC Infrastructure User Guide, some of DOCA's libraries also support a gRPC-based API. More information about these extended programming interfaces can be found in detail in the programming guides of the respective libraries. Programming Guides DOCA programming guides provide the full picture of DOCA libraries and their APIs. Each guide includes an"
18,192.168.2.28,"17: introduction, architecture, API overview, and other library-specific information. Each library's programming guide includes code snippets for achieving basic DOCA-based tasks. It is recommended to review these samples while going over the programming guide of the relevant DOCA library to learn about its API. The samples provide an implementation example of a single feature of a given DOCA library. For a more detailed reference of full DOCA-based programs that make use of multiple DOCA libraries, please refer to the Reference Applications. Applications Applications are a higher-level reference code than the samples and demonstrate how a full DOCA-based program can be built. In addition to the supplied source code and compilation definitions, the applications are also shipped in their compiled binary form. This is to allow users an out-of-the-box interaction with DOCA-based programs without the hassle of a developer-oriented compilation process. Many DOCA applications combine the functionality of more than o"
19,192.168.2.28,"18: ne DOCA library and offer an example implementation for common scenarios of interest to users such as application recognition according to incoming/outgoing traffic, scanning files using the hardware RegEx acceleration, and much more. For more information about DOCA applications, refer to DOCA Applications. Tools Some of the DOCA libraries are shipped alongside helper tools for both runtime and development. These tools are often an extension to the library's own API and bridge the gap between the library's expected input format and the input available to the users. An example for one such DOCA tool is the doca_dpi_compiler, responsible for converting Suricata-based rules to their matching .cdo definition files which are then used by the DOCA DPI library. For more information about DOCA tools, refer to DOCA Tools. Services DOCA services are containerized DOCA-based programs that provide an end-to-end solution for a given use case. DOCA services are accessible as part of NVIDIA's container catalog (NGC) fro"
20,192.168.2.28,"19: m which they can be easily deployed directly to BlueField, and sometimes also to the host. For more information about container-based deployment to the BlueField DPU or SmartNIC, refer to the NVIDIA BlueField DPU Container Deployment Guide. For more information about DOCA services, refer to the DOCA Services. Note For questions, comments, and feedback, please contact us at DOCA-Feedback@exchange.nvidia.com"
21,192.168.2.28,"20: A growing number of network applications need to exercise GPU real-time packet processing in order to implement high data rate solutions: data filtering, data placement, network analysis, sensors’ signal processing, and more. One primary motivation is the high degree of parallelism that the GPU can enable to process in parallel multiple packets while offering scalability and programmability. For an overview of the basic concepts of these techniques and an initial solution based on the DPDK gpudev library, see Boosting Inline Packet Processing Using DPDK and GPUdev with GPUs. This post explains how the new NVIDIA DOCA GPUNetIO Library can overcome some of the limitations found in the previous DPDK solution, moving a step closer to GPU-centric packet processing applications. Introduction Real-time GPU processing of network packets is a technique useful to several different application domains, including signal processing, network security, information gathering, and input reconstruction. The goal of these a"
22,192.168.2.28,"21: pplications is to realize an inline packet processing pipeline to receive packets in GPU memory (without staging copies through CPU memory); process them in parallel with one or more CUDA kernels; and then run inference, evaluate, or send over the network the result of the calculation. Typically, in this pipeline, the CPU is the intermediary because it has to synchronize network card (NIC) receive activity with the GPU processing. This wakes up the CUDA kernel as soon as new packets have been received in GPU memory. Similar considerations can be applied to the send side of the pipeline. Graphic showing a CPU-centric application wherein the CPU has to wake up the network card to receive packets (that will be transferred directly in GPU memory through DMA), unblock the CUDA kernel waiting for those packets to arrive in GPU to actually start the packet processing. Figure 1. CPU-centric application with the CPU orchestrating the GPU and network card work The Data Plane Development Kit (DPDK) framework introduce"
23,192.168.2.28,"22: d the gpudev library to provide a solution for this kind of application: receive or send using GPU memory (GPUDirect RDMA technology) in combination with low-latency CPU synchronization. For more information about different approaches to coordinating CPU and GPU activity, see Boosting Inline Packet Processing Using DPDK and GPUdev with GPUs. GPUDirect Async Kernel-Initiated Network communications Looking at Figure 1, it is clear that the CPU is the main bottleneck. It has too many responsibilities in synchronizing NIC and GPU tasks and managing multiple network queues. As an example, consider an application with many receive queues and incoming traffic of 100 Gbps. A CPU-centric solution would have: CPU invoking the network function on each receive queue to receive packets in GPU memory using one or multiple CPU cores CPU collecting packets’ info (packets addresses, number) CPU notifying the GPU about new received packets GPU processing the packets This CPU-centric approach is: Resourc"
24,192.168.2.28,"23: e consuming: To deal with high-rate network throughput (100 Gbps or more) the application may have to dedicate an entire CPU physical core to receive or send packets. Not scalable: To receive or send in parallel with different queues, the application may have to use multiple CPU cores, even on systems where the total number of CPU cores may be limited to a low number (depending on the platform). Platform-dependent: The same application on a low-power CPU decreases the performance. The next natural step for GPU inline packet processing applications is to remove the CPU from the critical path. Moving to a GPU-centric solution, the GPU can directly interact with the NIC to receive packets so the processing can start as soon as packets arrive in GPU memory. The same considerations can be applied to the send operation. The capability of a GPU to control the NIC activity from a CUDA kernel is called GPUDirect Async Kernel-Initiated Network (GDAKIN) communications. Assuming the use of an NVIDIA GPU and an "
25,192.168.2.28,"24: NVIDIA NIC, it is possible to expose the NIC registers to the direct access of the GPU. In this way, a CUDA kernel can directly configure and update these registers to orchestrate a send or a receive network operation without the intervention of the CPU. Graphic showing a GPU-centric application, with the GPU controlling the network card and packet processing without the need of the CPU. Figure 2. GPU-centric application with the GPU controlling the network card and packet processing without the need of the CPU DPDK is, by definition, a CPU framework. To enable GDAKIN communications, it would be necessary to move the whole control path on the GPU, which is not applicable. For this reason, this feature is enabled by creating a new NVIDIA DOCA library. NVIDIA DOCA GPUNetIO Library NVIDIA DOCA SDK is the new NVIDIA framework composed of drivers, libraries, tools, documentation, and example applications. These resources are needed to leverage your application with the network, security, and computation features"
26,192.168.2.28,"26: ork. With these applications, there is no need for a pre-synchronization phase across peers through an OOB mechanism, as for RDMA-based applications. There is also no need to assume other peers use DOCA GPUNetIO to communicate and no need to be topology-aware. In future releases, the RDMA option will be enabled to cover more use cases. Here are the DOCA GPUNetIO features enabled in the current release: GDAKIN communications: A CUDA kernel can invoke the CUDA device functions in the DOCA GPUNetIO Library to instruct the network card to send or receive packets. Accurate Send Scheduling: It is possible to schedule packets’ transmission in the future according to some user-provided timestamp. GPUDirect RDMA: Receive or send packets in contiguous fixed-size GPU memory strides without CPU memory staging copies. Semaphores: Provide a standardized low-latency message passing protocol between CPU and GPU or between different GPU CUDA kernels. CPU direct access to GPU memory: CPU can modify GPU me"
27,192.168.2.28,"27: mory buffers without using the CUDA memory API. Graphic depicting NVIDIA DOCA GPUNetIO configuration requiring a GPU and CUDA drivers and libraries installed on the same platform. Figure 3. NVIDIA DOCA GPUNetIO is a new DOCA library requiring a GPU and CUDA drivers and libraries installed on the same platform As shown in Figure 4, the typical DOCA GPUNetIO application steps are: Initial configuration phase on CPU Use DOCA to identify and initialize a GPU device and a network device Use DOCA GPUNetIO to create receive or send queues manageable from a CUDA kernel Use DOCA Flow to determine which type of packet should land in each receive queue (for example, subset of IP addresses, TCP or UDP protocol, and so on) Launch one or more CUDA kernels (to execute packet processing/filtering/analysis) Runtime control and data path on GPU within CUDA kernel Use DOCA GPUNetIO CUDA device functions to send or receive packets Use DOCA GPUNetIO CUDA device functions "
28,192.168.2.28,"28: to interact with the semaphores to synchronize the work with other CUDA kernels or with the CPU Flow chart showing generic GPU packet processing pipeline data flow composed by several building blocks: receive packets in GPU memory, first staging GPU packet processing or filtering, additional GPU processing (AI inference, for example), processing output stored in the GPU memory. Figure 4. Generic GPU packet processing pipeline data flow composed by several building blocks The following sections present an overview of possible GPU packet processing pipeline application layouts combining DOCA GPUNetIO building blocks. "

Full env printout

Click here to see environment details

[Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

No response

Code of Conduct

I agree to follow Morpheus' Code of Conduct
I have searched the open bugs and have found no duplicates for this bug report