[BUG] Failed to run mig.sh on MIG dataproc-2.1-ubuntu20
Opened this issue · 4 comments
Describe the bug
Observed following error while running mig.sh on dataproc-2.1-ubuntu20 with runtime version "2.1.72-ubuntu20" and kernel version "5.15.0-1067-gcp".
make -f ./scripts/Makefile.modpost
sed 's/\.ko$/\.o/' /var/lib/dkms/nvidia/495.29.05/build/modules.order | scripts/mod/modpost -m -a -o /var/lib/dkms/nvidia/495.29.05/build/Module.symvers -e -i Module.symvers -T -
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'rcu_read_unlock_strict'
make[2]: *** [scripts/Makefile.modpost:133: /var/lib/dkms/nvidia/495.29.05/build/Module.symvers] Error 1
Tried with some old dataproc runtime versions. It works with runtime version "2.1.40-ubuntu20" and kernel version "5.15.0-1049-gcp".
Steps/Code to reproduce bug
- Create dataproc cluster using MIG with nvidia-tesla-a100 gpu and runtime version "2.1.72-ubuntu20"
- ssh to gpu node
- download mig.sh
- sudo bash mig.sh
Expected behavior
succeed to run mig.sh
Environment details (please complete the following information)
- Environment location: Dataproc, version
2.1.72-ubuntu20
thanks for the investigation!
@sameerz This is the reason why mig-on-dataproc-2.1-ubuntu20 has been failing to initialize recently.
Hello @yinqingh, I think you're using a different version of /gpu/mig.sh
Can you try with /spark-rapids/mig.sh?
I’ll inform the repository maintainers about this inconsistency.
Edit: Created issue GoogleCloudDataproc/initialization-actions#1259
Hi @SurajAralihalli , I tried with spark-rapids/mig.sh but it still failed in installing nvidia driver (535.104.05) with the same error. The dataproc runtime version is "2.1.73-ubuntu20".
make -f ./scripts/Makefile.modpost
sed 's/\.ko$/\.o/' /var/lib/dkms/nvidia/535.104.05/build/modules.order | scripts/mod/modpost -m -a -o /var/lib/dkms/nvidia/535.104.05/build/Module.symvers -e -i Module.symvers -T -
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'rcu_read_unlock_strict'
make[2]: *** [scripts/Makefile.modpost:133: /var/lib/dkms/nvidia/535.104.05/build/Module.symvers] Error 1
make[2]: *** Deleting file '/var/lib/dkms/nvidia/535.104.05/build/Module.symvers'
make[1]: *** [Makefile:1829: modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.15.0-1070-gcp'
make: *** [Makefile:82: modules] Error 2
DKMSKernelVersion: 5.15.0-1070-gcp
Date: Fri Nov 8 09:07:43 2024
Package: nvidia-dkms-535 535.104.05-0ubuntu1
PackageVersion: 535.104.05-0ubuntu1
SourcePackage: nvidia-graphics-drivers-535
Title: nvidia-dkms-535 535.104.05-0ubuntu1: nvidia kernel module failed to build
more context at: GoogleCloudDataproc/initialization-actions#1259