NVIDIA/gds-nvidia-fs

nvidia-fs driver failed to load in ubuntu22.04 Unknown symbol nvidia_p2p_dma_map_pages (err -2)

gaowayne opened this issue · 3 comments

I installed MOFED 5.8.
then I install nvidia-fs by sudo apt install nvidia-fs

then when I tried to load nvidia-fs driver, I see below error.

---error to load the nvidia-fs driver after install---

root@oq1:/usr/local/cuda-12/gds/tools# apt install nvidia-fs
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
  alsa-topology-conf alsa-ucm-conf ca-certificates-java cuda-cccl-12-3 cuda-command-line-tools-12-3 cuda-compiler-12-3 cuda-crt-12-3 cuda-cudart-12-3 cuda-cudart-dev-12-3 cuda-cuobjdump-12-3 cuda-cupti-12-3
  cuda-cupti-dev-12-3 cuda-cuxxfilt-12-3 cuda-documentation-12-3 cuda-driver-dev-12-3 cuda-gdb-12-3 cuda-nsight-12-3 cuda-nsight-compute-12-3 cuda-nsight-systems-12-3 cuda-nvcc-12-3 cuda-nvdisasm-12-3
  cuda-nvml-dev-12-3 cuda-nvprof-12-3 cuda-nvprune-12-3 cuda-nvrtc-12-3 cuda-nvrtc-dev-12-3 cuda-nvtx-12-3 cuda-nvvm-12-3 cuda-nvvp-12-3 cuda-opencl-12-3 cuda-opencl-dev-12-3 cuda-profiler-api-12-3
  cuda-sanitizer-12-3 cuda-toolkit-12-3-config-common cuda-toolkit-12-config-common cuda-toolkit-config-common default-jre default-jre-headless fonts-dejavu-extra java-common libasound2 libasound2-data
  libatk-wrapper-java libatk-wrapper-java-jni libboost-iostreams1.74.0 libboost-program-options1.74.0 libboost-thread1.74.0 libcublas-12-3 libcublas-dev-12-3 libcufft-12-3 libcufft-dev-12-3 libcurand-12-3
  libcurand-dev-12-3 libcusolver-12-3 libcusolver-dev-12-3 libcusparse-12-3 libcusparse-dev-12-3 libevent-pthreads-2.1-7 libgfapi0 libgfrpc0 libgfxdr0 libgif7 libglusterfs0 libhwloc-plugins libhwloc15 libnpp-12-3
  libnpp-dev-12-3 libnvjitlink-12-3 libnvjitlink-dev-12-3 libnvjpeg-12-3 libnvjpeg-dev-12-3 libpcsclite1 libpmix2 libpsm-infinipath1 libpsm2-2 libtinfo5 libxcb-icccm4 libxcb-image0 libxcb-keysyms1
  libxcb-render-util0 libxcb-util1 libxcb-xinerama0 libxcb-xinput0 libxcb-xkb1 libxkbcommon-x11-0 nsight-compute-2023.3.1 nsight-systems-2023.3.3 nvidia-firmware-535-535.146.02 ocl-icd-libopencl1 openjdk-11-jre
  openjdk-11-jre-headless
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  nvidia-fs-dkms
Suggested packages:
  mlnx-ofed-all
The following NEW packages will be installed:
  nvidia-fs nvidia-fs-dkms
0 upgraded, 2 newly installed, 0 to remove and 23 not upgraded.
Need to get 64.9 kB of archives.
After this operation, 357 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  nvidia-fs-dkms 2.18.3-1 [62.4 kB]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  nvidia-fs 2.18.3-1 [2,530 B]
Fetched 64.9 kB in 0s (313 kB/s)     
Selecting previously unselected package nvidia-fs-dkms.
(Reading database ... 177240 files and directories currently installed.)
Preparing to unpack .../nvidia-fs-dkms_2.18.3-1_amd64.deb ...
Unpacking nvidia-fs-dkms (2.18.3-1) ...
Selecting previously unselected package nvidia-fs.
Preparing to unpack .../nvidia-fs_2.18.3-1_amd64.deb ...
Unpacking nvidia-fs (2.18.3-1) ...
Setting up nvidia-fs-dkms (2.18.3-1) ...
Creating symlink /var/lib/dkms/nvidia-fs/2.18.3/source -> /usr/src/nvidia-fs-2.18.3

Kernel preparation unnecessary for this kernel. Skipping...

Building module:
cleaning build area...
'make' -j32 KVER=5.15.0-91-generic IGNORE_CC_MISMATCH='1'...........
Signing module:
 - /var/lib/dkms/nvidia-fs/2.18.3/5.15.0-91-generic/x86_64/module/nvidia-fs.ko
Secure Boot not enabled on this system.
cleaning build area...

nvidia-fs.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.15.0-91-generic/updates/dkms/

depmod...

Backing up initrd.img-5.15.0-91-generic to /boot/initrd.img-5.15.0-91-generic.old-dkms
Making new initrd.img-5.15.0-91-generic
(If next boot fails, revert to initrd.img-5.15.0-91-generic.old-dkms image)
update-initramfs.........
modprobe: ERROR: could not insert 'nvidia_fs': Unknown symbol in module, or unknown parameter (see dmesg)

dmesg log

[Jan18 08:23] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000031] nvidia_fs: Unknown symbol nvidia_p2p_dma_unmap_pages (err -2)
[  +0.000320] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000016] nvidia_fs: Unknown symbol nvidia_p2p_get_pages (err -2)
[  +0.000285] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000017] nvidia_fs: Unknown symbol nvidia_p2p_put_pages (err -2)
[  +0.000284] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000016] nvidia_fs: Unknown symbol nvidia_p2p_dma_map_pages (err -2)
[  +0.000284] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000015] nvidia_fs: Unknown symbol nvidia_p2p_free_dma_mapping (err -2)
[  +0.000276] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000016] nvidia_fs: Unknown symbol nvidia_p2p_free_page_table (err -2)
[Jan18 08:29] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000035] nvidia_fs: Unknown symbol nvidia_p2p_dma_unmap_pages (err -2)
[  +0.000325] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000019] nvidia_fs: Unknown symbol nvidia_p2p_get_pages (err -2)
[  +0.000287] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000017] nvidia_fs: Unknown symbol nvidia_p2p_put_pages (err -2)
[  +0.000285] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000017] nvidia_fs: Unknown symbol nvidia_p2p_dma_map_pages (err -2)
[  +0.000286] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000026] nvidia_fs: Unknown symbol nvidia_p2p_free_dma_mapping (err -2)
[  +0.000271] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[  +0.000016] nvidia_fs: Unknown symbol nvidia_p2p_free_page_table (err -2)

dcg@oq1:/usr/src/nvidia-fs-2.18.3$ nvidia-smi
Thu Jan 18 09:17:04 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P100-PCIE-16GB           On  | 00000000:3B:00.0 Off |                    0 |
| N/A   37C    P0              28W / 250W |      0MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

same issue

please install nvidia kernel open RM driver for the Proprietory symbol issues.

Starting with CUDA toolkit 12.2.2, GDS kernel driver package nvidia-gds version 12.2.2-1 (provided by nvidia-fs-dkms 2.17.5-1) and above is only supported with the NVIDIA open kernel driver. Follow the instructions in Removing CUDA Toolkit and Driver to remove existing NVIDIA driver packages and then follow instructions in NVIDIA Open GPU Kernel Modules to install NVIDIA open kernel driver packages.

Is anybody solve this problem?