nvidia_peer_memory-1.0-8 modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument
yug0slav opened this issue · 3 comments
CentOS Linux release 7.7.1908 (Core)
uname -r
3.10.0-1062.9.1.el7.x86_64
lspci |grep mellanox -i
5e:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
5e:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
ofed_info -s
MLNX_OFED_LINUX-4.7-1.0.0.1:
nvidia-smi
Sat Dec 7 00:07:38 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
...
./build_module.sh
Building source rpm for nvidia_peer_memory...
Built: /tmp/nvidia_peer_memory-1.0-8.src.rpm
To install run on RPM based OS:
# rpmbuild --rebuild /tmp/nvidia_peer_memory-1.0-8.src.rpm
# rpm -ivh
[root@bmlp-c08006:/tmp/nv_peer_memory]# rpmbuild --rebuild /tmp/nvidia_peer_memory-1.0-8.src.rpm
Installing /tmp/nvidia_peer_memory-1.0-8.src.rpm
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.SBGi1I
- umask 022
- cd /root/rpmbuild/BUILD
- cd /root/rpmbuild/BUILD
- rm -rf nvidia_peer_memory-1.0
- /usr/bin/gzip -dc /root/rpmbuild/SOURCES/nvidia_peer_memory-1.0.tar.gz
- /usr/bin/tar -xvvf -
drwxr-xr-x root/root 0 2019-12-07 00:04 nvidia_peer_memory-1.0/
drwxr-xr-x root/root 0 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/
drwxr-xr-x root/root 0 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/patches/
-rw-r--r-- root/root 369 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/patches/dkms_name.patch
-rw-r--r-- root/root 16 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/patches/series
drwxr-xr-x root/root 0 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/source/
-rw-r--r-- root/root 12 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/source/format
-rw-r--r-- root/root 1791 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/changelog
-rw-r--r-- root/root 2 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/compat
-rw-r--r-- root/root 910 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/control
-rw-r--r-- root/root 10 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/nvidia-peer-memory-dkms.dkms
-rw-r--r-- root/root 245 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/nvidia-peer-memory-dkms.postinst
-rwxr-xr-x root/root 198 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/nvidia-peer-memory.postinst
-rwxr-xr-x root/root 199 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/nvidia-peer-memory.prerm
-rwxr-xr-x root/root 1362 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/rules
-rwxr-xr-x root/root 431 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/updateInit.sh
-rw-r--r-- root/root 3707 2019-12-07 00:04 nvidia_peer_memory-1.0/Makefile
-rw-r--r-- root/root 3415 2019-12-07 00:04 nvidia_peer_memory-1.0/README.md
-rwxr-xr-x root/root 2276 2019-12-07 00:04 nvidia_peer_memory-1.0/build_module.sh
-rw-r--r-- root/root 5817 2019-12-07 00:04 nvidia_peer_memory-1.0/compat_nv-p2p.h
-rwxr-xr-x root/root 4031 2019-12-07 00:04 nvidia_peer_memory-1.0/create_nv.symvers.sh
-rw-r--r-- root/root 614 2019-12-07 00:04 nvidia_peer_memory-1.0/dkms.conf
-rwxr-xr-x root/root 2756 2019-12-07 00:04 nvidia_peer_memory-1.0/nv_peer_mem
-rwxr-xr-x root/root 13013 2019-12-07 00:04 nvidia_peer_memory-1.0/nv_peer_mem.c
-rw-r--r-- root/root 47 2019-12-07 00:04 nvidia_peer_memory-1.0/nv_peer_mem.conf
-rwxr-xr-x root/root 241 2019-12-07 00:04 nvidia_peer_memory-1.0/nv_peer_mem.upstart
-rw-r--r-- root/root 3299 2019-12-07 00:04 nvidia_peer_memory-1.0/nvidia_peer_memory.spec - STATUS=0
- '[' 0 -ne 0 ']'
- cd nvidia_peer_memory-1.0
- /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
- exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.wIhXAm - umask 022
- cd /root/rpmbuild/BUILD
- cd nvidia_peer_memory-1.0
- export KVER=3.10.0-1062.9.1.el7.x86_64
- KVER=3.10.0-1062.9.1.el7.x86_64
- make KVER=3.10.0-1062.9.1.el7.x86_64 all
/root/rpmbuild/BUILD/nvidia_peer_memory-1.0/create_nv.symvers.sh 3.10.0-1062.9.1.el7.x86_64
'/lib/modules/3.10.0-1062.9.1.el7.x86_64/extra/nvidia.ko.xz' -> './nvidia.ko.xz'
Getting symbol versions from nvidia.ko ...
Created: /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv.symvers
Found /usr/src/nvidia-440.33.01//nvidia/nv-p2p.h
/bin/cp -f /usr/src/nvidia-440.33.01//nvidia/nv-p2p.h /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv-p2p.h
cp -rf /usr/src/ofa_kernel/default/Module.symvers .
cat nv.symvers >> Module.symvers
make -C /lib/modules/3.10.0-1062.9.1.el7.x86_64/build M=/root/rpmbuild/BUILD/nvidia_peer_memory-1.0 modules
make[1]: Entering directory/usr/src/kernels/3.10.0-1062.9.1.el7.x86_64' CC [M] /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.o /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.c:80:9: note: #pragma message: Enable nvidia_p2p_dma_map_pages support #pragma message("Enable nvidia_p2p_dma_map_pages support") ^ Building modules, stage 2. MODPOST 1 modules CC /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.mod.o LD [M] /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.ko make[1]: Leaving directory
/usr/src/kernels/3.10.0-1062.9.1.el7.x86_64' - exit 0
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.gQcTGj - umask 022
- cd /root/rpmbuild/BUILD
- '[' /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64 '!=' / ']'
- rm -rf /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64
++ dirname /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64 - mkdir -p /root/rpmbuild/BUILDROOT
- mkdir /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64
- cd nvidia_peer_memory-1.0
- export KVER=3.10.0-1062.9.1.el7.x86_64
- KVER=3.10.0-1062.9.1.el7.x86_64
- make DESTDIR=/root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64 KVER=3.10.0-1062.9.1.el7.x86_64 install
mkdir -p /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64//lib/modules/3.10.0-1062.9.1.el7.x86_64/extra/;
cp -f /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.ko /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64//lib/modules/3.10.0-1062.9.1.el7.x86_64/extra/;
if [ ! -n "/root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64" ]; then /sbin/depmod -r -ae 3.10.0-1062.9.1.el7.x86_64;fi; - install -d /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64/etc/infiniband
- install -m 0644 /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.conf /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64/etc/infiniband
- install -d /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64/etc/init.d
- install -m 0755 /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64/etc/init.d
- /usr/lib/rpm/check-buildroot
- /usr/lib/rpm/redhat/brp-compress
- /usr/lib/rpm/redhat/brp-strip /usr/bin/strip
- /usr/lib/rpm/redhat/brp-strip-comment-note /usr/bin/strip /usr/bin/objdump
- /usr/lib/rpm/redhat/brp-strip-static-archive /usr/bin/strip
- /usr/lib/rpm/brp-python-bytecompile /usr/bin/python 1
- /usr/lib/rpm/redhat/brp-python-hardlink
- /usr/lib/rpm/redhat/brp-java-repack-jars
Processing files: nvidia_peer_memory-1.0-8.x86_64
Provides: nvidia_peer_memory = 1.0-8 nvidia_peer_memory(x86-64) = 1.0-8
Requires(interp): /bin/sh /bin/sh
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Requires(post): /bin/sh
Requires(preun): /bin/sh
Requires: /bin/bash
Checking for unpackaged file(s): /usr/lib/rpm/check-files /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64
Wrote: /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-8.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.Cqwtul - umask 022
- cd /root/rpmbuild/BUILD
- cd nvidia_peer_memory-1.0
- cd /tmp
- chmod -R o+w /root/rpmbuild/BUILD/nvidia_peer_memory-1.0
- rm -rf /root/rpmbuild/BUILD/nvidia_peer_memory-1.0
- test x/root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64 '!=' x
- rm -rf /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64
- exit 0
Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.ShKItm - umask 022
- cd /root/rpmbuild/BUILD
- rm -rf nvidia_peer_memory-1.0
- exit 0
yum install /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-8.x86_64.rpm
Loaded plugins: enabled_repos_upload, fastestmirror, langpacks, nvidia, package_upload, product-id, search-disabled-repos, subscription-manager
Examining /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-8.x86_64.rpm: nvidia_peer_memory-1.0-8.x86_64
Marking /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-8.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package nvidia_peer_memory.x86_64 0:1.0-8 will be installed
--> Finished Dependency Resolution
...
Dependencies Resolved
Package Arch Version Repository Size
Installing:
nvidia_peer_memory x86_64 1.0-8 /nvidia_peer_memory-1.0-8.x86_64 291 k
Transaction Summary
Install 1 Package
Total size: 291 k
Installed size: 291 k
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : nvidia_peer_memory-1.0-8.x86_64 1/1
modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument
/etc/init.d/nv_peer_mem restart
stopping... OK
starting... modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument
Failed to load nv_peer_mem
#dmesg
...
[ 2072.534744] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_unmap_pages
[ 2072.534750] nv_peer_mem: Unknown symbol nvidia_p2p_dma_unmap_pages (err -22)
[ 2072.534767] nv_peer_mem: disagrees about version of symbol nvidia_p2p_get_pages
[ 2072.534768] nv_peer_mem: Unknown symbol nvidia_p2p_get_pages (err -22)
[ 2072.534779] nv_peer_mem: disagrees about version of symbol nvidia_p2p_put_pages
[ 2072.534780] nv_peer_mem: Unknown symbol nvidia_p2p_put_pages (err -22)
[ 2072.534801] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_map_pages
[ 2072.534802] nv_peer_mem: Unknown symbol nvidia_p2p_dma_map_pages (err -22)
[ 2072.534810] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_dma_mapping
[ 2072.534811] nv_peer_mem: Unknown symbol nvidia_p2p_free_dma_mapping (err -22)
[ 2072.534819] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_page_table
[ 2072.534820] nv_peer_mem: Unknown symbol nvidia_p2p_free_page_table (err -22)
Can confirm that #60 fixed this problem for me.
CentOS 8 / RHEL8 (4.18.0-80.11.2.el8_0.x86_64), Nvidia Driver 440.33.01