Mellanox/nv_peer_memory

install failed after kernel upgrade

tingweiwu opened this issue · 1 comments

after I upgrade kernel from h142 to h193

[nvidia-peer-memory-1.0]# ./build_module.sh 

Building source rpm for nvidia_peer_memory...

Built: /tmp/nvidia_peer_memory-1.0-7.src.rpm

To install run on RPM based OS:
    # rpmbuild --rebuild /tmp/nvidia_peer_memory-1.0-7.src.rpm
    # rpm -ivh <path to generated binary rpm file>



[nvidia-peer-memory-1.0]# rpmbuild --rebuild /tmp/nvidia_peer_memory-1.0-7.src.rpm
Installing /tmp/nvidia_peer_memory-1.0-7.src.rpm
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.fdt3MB
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd /root/rpmbuild/BUILD
+ rm -rf nvidia_peer_memory-1.0
+ /usr/bin/gzip -dc /root/rpmbuild/SOURCES/nvidia_peer_memory-1.0.tar.gz
+ /usr/bin/tar -xvvf -
drwx------ root/root         0 2019-07-22 15:49 nvidia_peer_memory-1.0/
-rw------- root/root      5817 2019-07-22 15:49 nvidia_peer_memory-1.0/compat_nv-p2p.h
drwx------ root/root         0 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/
drwx------ root/root         0 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/source/
-rw------- root/root        12 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/source/format
-rwx------ root/root       199 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/nvidia-peer-memory.prerm
-rwx------ root/root       231 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/nvidia-peer-memory-dkms.prerm
-rw------- root/root         2 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/compat
-rw------- root/root      1613 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/changelog
-rw------- root/root       912 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/control
-rwx------ root/root      1362 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/rules
-rwx------ root/root       506 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/nvidia-peer-memory-dkms.postinst
-rwx------ root/root       431 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/updateInit.sh
-rwx------ root/root       198 2019-07-22 15:49 nvidia_peer_memory-1.0/debian/nvidia-peer-memory.postinst
-rw------- root/root       614 2019-07-22 15:49 nvidia_peer_memory-1.0/dkms.conf
-rw------- root/root        47 2019-07-22 15:49 nvidia_peer_memory-1.0/nv_peer_mem.conf
-rwx------ root/root      2276 2019-07-22 15:49 nvidia_peer_memory-1.0/build_module.sh
-rwx------ root/root     13013 2019-07-22 15:49 nvidia_peer_memory-1.0/nv_peer_mem.c
-rw------- root/root      3415 2019-07-22 15:49 nvidia_peer_memory-1.0/README.md
-rwx------ root/root      3765 2019-07-22 15:49 nvidia_peer_memory-1.0/create_nv.symvers.sh
-rwx------ root/root       241 2019-07-22 15:49 nvidia_peer_memory-1.0/nv_peer_mem.upstart
-rw------- root/root      3299 2019-07-22 15:49 nvidia_peer_memory-1.0/nvidia_peer_memory.spec
-rw------- root/root      3707 2019-07-22 15:49 nvidia_peer_memory-1.0/Makefile
-rwx------ root/root      2756 2019-07-22 15:49 nvidia_peer_memory-1.0/nv_peer_mem
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd nvidia_peer_memory-1.0
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.yPvOeO
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd nvidia_peer_memory-1.0
+ export KVER=3.10.0-514.44.5.10.h193.x86_64
+ KVER=3.10.0-514.44.5.10.h193.x86_64
+ make KVER=3.10.0-514.44.5.10.h193.x86_64 all
/root/rpmbuild/BUILD/nvidia_peer_memory-1.0/create_nv.symvers.sh 3.10.0-514.44.5.10.h193.x86_64
Getting symbol versions from /lib/modules/3.10.0-514.44.5.10.h193.x86_64/kernel/drivers/video/nvidia.ko ...
Created: /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv.symvers
Found /usr/src/nvidia-418.39//nvidia/nv-p2p.h
/bin/cp -f /usr/src/nvidia-418.39//nvidia/nv-p2p.h /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv-p2p.h
cp -rf /usr/src/ofa_kernel/default/Module.symvers .
cat nv.symvers >> Module.symvers
make -C /lib/modules/3.10.0-514.44.5.10.h193.x86_64/build  M=/root/rpmbuild/BUILD/nvidia_peer_memory-1.0 modules
make[1]: Entering directory `/usr/src/kernels/3.10.0-514.44.5.10.h193.x86_64'
  CC [M]  /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.o
/root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.c:80:9: note: #pragma message: Enable nvidia_p2p_dma_map_pages support
 #pragma message("Enable nvidia_p2p_dma_map_pages support")
         ^
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.mod.o
  LD [M]  /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.ko
make[1]: Leaving directory `/usr/src/kernels/3.10.0-514.44.5.10.h193.x86_64'
+ exit 0
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.hdfOm7
+ umask 022
+ cd /root/rpmbuild/BUILD
+ '[' /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64 '!=' / ']'
+ rm -rf /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64
++ dirname /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64
+ mkdir -p /root/rpmbuild/BUILDROOT
+ mkdir /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64
+ cd nvidia_peer_memory-1.0
+ export KVER=3.10.0-514.44.5.10.h193.x86_64
+ KVER=3.10.0-514.44.5.10.h193.x86_64
+ make DESTDIR=/root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64 KVER=3.10.0-514.44.5.10.h193.x86_64 install
mkdir -p /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64//lib/modules/3.10.0-514.44.5.10.h193.x86_64/extra/;
cp -f /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.ko /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64//lib/modules/3.10.0-514.44.5.10.h193.x86_64/extra/;
if [ ! -n "/root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64" ]; then /sbin/depmod -r -ae 3.10.0-514.44.5.10.h193.x86_64;fi;
+ install -d /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64/etc/infiniband
+ install -m 0644 /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem.conf /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64/etc/infiniband
+ install -d /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64/etc/init.d
+ install -m 0755 /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv_peer_mem /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64/etc/init.d
+ /usr/lib/rpm/check-buildroot
+ /usr/lib/rpm/redhat/brp-compress
+ /usr/lib/rpm/redhat/brp-strip /usr/bin/strip
+ /usr/lib/rpm/redhat/brp-strip-comment-note /usr/bin/strip /usr/bin/objdump
+ /usr/lib/rpm/redhat/brp-strip-static-archive /usr/bin/strip
+ /usr/lib/rpm/brp-python-bytecompile /usr/bin/python 1
+ /usr/lib/rpm/redhat/brp-python-hardlink
+ /usr/lib/rpm/redhat/brp-java-repack-jars
Processing files: nvidia_peer_memory-1.0-7.x86_64
Provides: nvidia_peer_memory = 1.0-7 nvidia_peer_memory(x86-64) = 1.0-7
Requires(interp): /bin/sh /bin/sh
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Requires(post): /bin/sh
Requires(preun): /bin/sh
Requires: /bin/bash
Checking for unpackaged file(s): /usr/lib/rpm/check-files /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64
Wrote: /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-7.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.9VAnUK
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd nvidia_peer_memory-1.0
+ cd /tmp
+ chmod -R o+w /root/rpmbuild/BUILD/nvidia_peer_memory-1.0
+ rm -rf /root/rpmbuild/BUILD/nvidia_peer_memory-1.0
+ test x/root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64 '!=' x
+ rm -rf /root/rpmbuild/BUILDROOT/nvidia_peer_memory-1.0-7.x86_64
+ exit 0
Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.SmzrJ4
+ umask 022
+ cd /root/rpmbuild/BUILD
+ rm -rf nvidia_peer_memory-1.0
+ exit 0



[nvidia-peer-memory-1.0]# rpm -ivh /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-7.x86_64.rpm
Preparing...                          ################################# [100%]
	package nvidia_peer_memory-1.0-7.x86_64 is already installed

but ERROR: Module nv_peer_mem not found when status ERROR: Module nv_peer_mem not found

● nv_peer_mem.service - LSB: Activates/Deactivates nv_peer_mem module to start at boot time.
   Loaded: loaded (/etc/rc.d/init.d/nv_peer_mem; bad; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2019-07-22 15:42:28 CST; 10min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 29524 ExecStart=/etc/rc.d/init.d/nv_peer_mem start (code=exited, status=1/FAILURE)

Jul 22 15:42:28  systemd[1]: Starting LSB: Activates/Deactivates nv_peer_mem module to start at boot time....
Jul 22 15:42:28  nv_peer_mem[29524]: starting... modinfo: ERROR: Module nv_peer_mem not found.
Jul 22 15:42:28  nv_peer_mem[29524]: Module nv_peer_mem does not exist
Jul 22 15:42:28  nv_peer_mem[29524]: Failed to load nv_peer_mem
Jul 22 15:42:28  systemd[1]: nv_peer_mem.service: control process exited, code=exited status=1
Jul 22 15:42:28  systemd[1]: Failed to start LSB: Activates/Deactivates nv_peer_mem module to start at boot time..
Jul 22 15:42:28  systemd[1]: Unit nv_peer_mem.service entered failed state.
Jul 22 15:42:28  systemd[1]: nv_peer_mem.service failed.

and locate nv_peer_mem.ko still show the old kernel version path

[nvidia-peer-memory-1.0]# locate nv_peer_mem.ko
/usr/lib/modules/3.10.0-514.44.5.10.h142.x86_64/extra/nv_peer_mem.ko

hi,

as you can see, the installation of the new rpm (for the new kernel failed):
[nvidia-peer-memory-1.0]# rpm -ivh /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-7.x86_64.rpm
Preparing... ################################# [100%]
package nvidia_peer_memory-1.0-7.x86_64 is already installed
^^^^^^^^^^^^^^^^^^^

You can remove the already installed rpm (that provided the module for the old kernel), then install the new one that you built for the new kernel:

# rpm -e nvidia_peer_memory
# rpm -ivh /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-7.x86_64.rpm