Mellanox/nv_peer_memory

nvidia_peer_memory-1.0-8 modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument

yug0slav opened this issue · 3 comments

CentOS Linux release 7.7.1908 (Core)

uname -r

3.10.0-1062.9.1.el7.x86_64

lspci |grep mellanox -i

5e:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
5e:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

ofed_info -s

MLNX_OFED_LINUX-4.7-1.0.0.1:

nvidia-smi

Sat Dec 7 00:07:38 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
...

./build_module.sh

Building source rpm for nvidia_peer_memory...

Built: /tmp/nvidia_peer_memory-1.0-8.src.rpm

To install run on RPM based OS:
# rpmbuild --rebuild /tmp/nvidia_peer_memory-1.0-8.src.rpm
# rpm -ivh

[root@bmlp-c08006:/tmp/nv_peer_memory]# rpmbuild --rebuild /tmp/nvidia_peer_memory-1.0-8.src.rpm
Installing /tmp/nvidia_peer_memory-1.0-8.src.rpm
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.SBGi1I

  • umask 022
  • cd /root/rpmbuild/BUILD
  • cd /root/rpmbuild/BUILD
  • rm -rf nvidia_peer_memory-1.0
  • /usr/bin/gzip -dc /root/rpmbuild/SOURCES/nvidia_peer_memory-1.0.tar.gz
  • /usr/bin/tar -xvvf -
    drwxr-xr-x root/root 0 2019-12-07 00:04 nvidia_peer_memory-1.0/
    drwxr-xr-x root/root 0 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/
    drwxr-xr-x root/root 0 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/patches/
    -rw-r--r-- root/root 369 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/patches/dkms_name.patch
    -rw-r--r-- root/root 16 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/patches/series
    drwxr-xr-x root/root 0 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/source/
    -rw-r--r-- root/root 12 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/source/format
    -rw-r--r-- root/root 1791 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/changelog
    -rw-r--r-- root/root 2 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/compat
    -rw-r--r-- root/root 910 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/control
    -rw-r--r-- root/root 10 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/nvidia-peer-memory-dkms.dkms
    -rw-r--r-- root/root 245 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/nvidia-peer-memory-dkms.postinst
    -rwxr-xr-x root/root 198 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/nvidia-peer-memory.postinst
    -rwxr-xr-x root/root 199 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/nvidia-peer-memory.prerm
    -rwxr-xr-x root/root 1362 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/rules
    -rwxr-xr-x root/root 431 2019-12-07 00:04 nvidia_peer_memory-1.0/debian/updateInit.sh
    -rw-r--r-- root/root 3707 2019-12-07 00:04 nvidia_peer_memory-1.0/Makefile
    -rw-r--r-- root/root 3415 2019-12-07 00:04 nvidia_peer_memory-1.0/README.md
    -rwxr-xr-x root/root 2276 2019-12-07 00:04 nvidia_peer_memory-1.0/build_module.sh
    -rw-r--r-- root/root 5817 2019-12-07 00:04 nvidia_peer_memory-1.0/compat_nv-p2p.h
    -rwxr-xr-x root/root 4031 2019-12-07 00:04 nvidia_peer_memory-1.0/create_nv.symvers.sh
    -rw-r--r-- root/root 614 2019-12-07 00:04 nvidia_peer_memory-1.0/dkms.conf
    -rwxr-xr-x root/root 2756 2019-12-07 00:04 nvidia_peer_memory-1.0/nv_peer_mem
    -rwxr-xr-x root/root 13013 2019-12-07 00:04 nvidia_peer_memory-1.0/nv_peer_mem.c
    -rw-r--r-- root/root 47 2019-12-07 00:04 nvidia_peer_memory-1.0/nv_peer_mem.conf
    -rwxr-xr-x root/root 241 2019-12-07 00:04 nvidia_peer_memory-1.0/nv_peer_mem.upstart
    -rw-r--r-- root/root 3299 2019-12-07 00:04 nvidia_peer_memory-1.0/nvidia_peer_memory.spec
  • STATUS=0
  • '[' 0 -ne 0 ']'
  • cd nvidia_peer_memory-1.0
  • /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
  • exit 0
    Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.wIhXAm
  • umask 022
  • cd /root/rpmbuild/BUILD
  • cd nvidia_peer_memory-1.0
  • export KVER=3.10.0-1062.9.1.el7.x86_64
  • KVER=3.10.0-1062.9.1.el7.x86_64
  • make KVER=3.10.0-1062.9.1.el7.x86_64 all
    /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/create_nv.symvers.sh 3.10.0-1062.9.1.el7.x86_64
    '/lib/modules/3.10.0-1062.9.1.el7.x86_64/extra/nvidia.ko.xz' -> './nvidia.ko.xz'
    Getting symbol versions from nvidia.ko ...
    Created: /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv.symvers
    Found /usr/src/nvidia-440.33.01//nvidia/nv-p2p.h
    /bin/cp -f /usr/src/nvidia-440.33.01//nvidia/nv-p2p.h /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv-p2p.h
    cp -rf /usr/src/ofa_kernel/default/Module.symvers .
    cat nv.symvers >> Module.symvers
    make -C /lib/modules/3.10.0-1062.9.1.el7.x86_64/build M=/root/rpmbuild/BUILD/nvidia_peer_memory-1.0 modules
    make[1]: Entering directory /usr/src/kernels/3.10.0-1062.9.1.el7.x86_64' CC [M] /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.o /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.c:80:9: note: #pragma message: Enable nvidia_p2p_dma_map_pages support #pragma message("Enable nvidia_p2p_dma_map_pages support") ^ Building modules, stage 2. MODPOST 1 modules CC /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.mod.o LD [M] /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.ko make[1]: Leaving directory /usr/src/kernels/3.10.0-1062.9.1.el7.x86_64'
  • exit 0
    Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.gQcTGj
  • umask 022
  • cd /root/rpmbuild/BUILD
  • '[' /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64 '!=' / ']'
  • rm -rf /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64
    ++ dirname /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64
  • mkdir -p /root/rpmbuild/BUILDROOT
  • mkdir /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64
  • cd nvidia_peer_memory-1.0
  • export KVER=3.10.0-1062.9.1.el7.x86_64
  • KVER=3.10.0-1062.9.1.el7.x86_64
  • make DESTDIR=/root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64 KVER=3.10.0-1062.9.1.el7.x86_64 install
    mkdir -p /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64//lib/modules/3.10.0-1062.9.1.el7.x86_64/extra/;
    cp -f /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.ko /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64//lib/modules/3.10.0-1062.9.1.el7.x86_64/extra/;
    if [ ! -n "/root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64" ]; then /sbin/depmod -r -ae 3.10.0-1062.9.1.el7.x86_64;fi;
  • install -d /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64/etc/infiniband
  • install -m 0644 /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.conf /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64/etc/infiniband
  • install -d /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64/etc/init.d
  • install -m 0755 /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64/etc/init.d
  • /usr/lib/rpm/check-buildroot
  • /usr/lib/rpm/redhat/brp-compress
  • /usr/lib/rpm/redhat/brp-strip /usr/bin/strip
  • /usr/lib/rpm/redhat/brp-strip-comment-note /usr/bin/strip /usr/bin/objdump
  • /usr/lib/rpm/redhat/brp-strip-static-archive /usr/bin/strip
  • /usr/lib/rpm/brp-python-bytecompile /usr/bin/python 1
  • /usr/lib/rpm/redhat/brp-python-hardlink
  • /usr/lib/rpm/redhat/brp-java-repack-jars
    Processing files: nvidia_peer_memory-1.0-8.x86_64
    Provides: nvidia_peer_memory = 1.0-8 nvidia_peer_memory(x86-64) = 1.0-8
    Requires(interp): /bin/sh /bin/sh
    Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
    Requires(post): /bin/sh
    Requires(preun): /bin/sh
    Requires: /bin/bash
    Checking for unpackaged file(s): /usr/lib/rpm/check-files /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64
    Wrote: /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-8.x86_64.rpm
    Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.Cqwtul
  • umask 022
  • cd /root/rpmbuild/BUILD
  • cd nvidia_peer_memory-1.0
  • cd /tmp
  • chmod -R o+w /root/rpmbuild/BUILD/nvidia_peer_memory-1.0
  • rm -rf /root/rpmbuild/BUILD/nvidia_peer_memory-1.0
  • test x/root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64 '!=' x
  • rm -rf /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-8.x86_64
  • exit 0
    Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.ShKItm
  • umask 022
  • cd /root/rpmbuild/BUILD
  • rm -rf nvidia_peer_memory-1.0
  • exit 0

yum install /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-8.x86_64.rpm

Loaded plugins: enabled_repos_upload, fastestmirror, langpacks, nvidia, package_upload, product-id, search-disabled-repos, subscription-manager
Examining /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-8.x86_64.rpm: nvidia_peer_memory-1.0-8.x86_64
Marking /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-8.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package nvidia_peer_memory.x86_64 0:1.0-8 will be installed
--> Finished Dependency Resolution
...
Dependencies Resolved
Package Arch Version Repository Size
Installing:
nvidia_peer_memory x86_64 1.0-8 /nvidia_peer_memory-1.0-8.x86_64 291 k

Transaction Summary

Install 1 Package

Total size: 291 k
Installed size: 291 k
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : nvidia_peer_memory-1.0-8.x86_64 1/1
modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument

/etc/init.d/nv_peer_mem restart

stopping... OK
starting... modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument
Failed to load nv_peer_mem

#dmesg
...
[ 2072.534744] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_unmap_pages
[ 2072.534750] nv_peer_mem: Unknown symbol nvidia_p2p_dma_unmap_pages (err -22)
[ 2072.534767] nv_peer_mem: disagrees about version of symbol nvidia_p2p_get_pages
[ 2072.534768] nv_peer_mem: Unknown symbol nvidia_p2p_get_pages (err -22)
[ 2072.534779] nv_peer_mem: disagrees about version of symbol nvidia_p2p_put_pages
[ 2072.534780] nv_peer_mem: Unknown symbol nvidia_p2p_put_pages (err -22)
[ 2072.534801] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_map_pages
[ 2072.534802] nv_peer_mem: Unknown symbol nvidia_p2p_dma_map_pages (err -22)
[ 2072.534810] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_dma_mapping
[ 2072.534811] nv_peer_mem: Unknown symbol nvidia_p2p_free_dma_mapping (err -22)
[ 2072.534819] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_page_table
[ 2072.534820] nv_peer_mem: Unknown symbol nvidia_p2p_free_page_table (err -22)

#60 might be a fix. Testing...

Can confirm that #60 fixed this problem for me.
CentOS 8 / RHEL8 (4.18.0-80.11.2.el8_0.x86_64), Nvidia Driver 440.33.01

fixed by #60 closing