GoogleCloudPlatform/container-engine-accelerators

Installation on ubuntu 18.04 LTS VM on google cloud fails on "Unable to locate package linux-headers-5.3.0-1029-gcp"

Svendegroote91 opened this issue · 0 comments

I deployed Kubernetes with Kubespray on a set of Ubuntu VMs on Google Cloud.
One of the worker has a GPU (tesla K80).

When running the daemonset to install the Nvidia driver, I get the following error message:

+ NVIDIA_DRIVER_VERSION=384.111
+ NVIDIA_DRIVER_DOWNLOAD_URL_DEFAULT=https://us.download.nvidia.com/tesla/384.111/NVIDIA-Linux-x86_64-384.111.run
+ NVIDIA_DRIVER_DOWNLOAD_URL=https://us.download.nvidia.com/tesla/384.111/NVIDIA-Linux-x86_64-384.111.run
+ NVIDIA_INSTALL_DIR_HOST=/home/kubernetes/bin/nvidia
+ NVIDIA_INSTALL_DIR_CONTAINER=/usr/local/nvidia
++ basename https://us.download.nvidia.com/tesla/384.111/NVIDIA-Linux-x86_64-384.111.run
+ NVIDIA_INSTALLER_RUNFILE=NVIDIA-Linux-x86_64-384.111.run
+ ROOT_MOUNT_DIR=/root
+ CACHE_FILE=/usr/local/nvidia/.cache
++ uname -r
+ KERNEL_VERSION=5.3.0-1029-gcp
+ set +x
Checking cached version
Cache file /usr/local/nvidia/.cache not found.
Downloading kernel sources...
Get:1 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:2 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Get:3 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
Get:4 http://security.ubuntu.com/ubuntu xenial-security InRelease [109 kB]
Get:5 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
Get:6 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB]
Get:7 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB]
Get:8 http://archive.ubuntu.com/ubuntu xenial/multiverse amd64 Packages [176 kB]
Get:9 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [1495 kB]
Get:10 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [13.1 kB]
Get:11 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [1032 kB]
Get:12 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [19.7 kB]
Get:13 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [7942 B]
Get:14 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [8807 B]
Get:15 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [1131 kB]
Get:16 http://security.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [12.7 kB]
Get:17 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [629 kB]
Get:18 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [6679 B]
Fetched 16.5 MB in 1s (8682 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package linux-headers-5.3.0-1029-gcp
E: Couldn't find any package by glob 'linux-headers-5.3.0-1029-gcp'
E: Couldn't find any package by regex 'linux-headers-5.3.0-1029-gcp'

The GPU-worker node has the following specs:

System Info:
 Machine ID:                 8840fa09cb0c8bf2bb021033c01b5a14
 System UUID:                8840fa09-cb0c-8bf2-bb02-1033c01b5a14
 Boot ID:                    44a9e2ab-6039-4ceb-a6f6-a1372fac0fe5
 Kernel Version:             5.3.0-1029-gcp
 OS Image:                   Ubuntu 18.04.4 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://19.3.11
 Kubelet Version:            v1.18.3
 Kube-Proxy Version:         v1.18.3

When I check on the VM itself it the package is there, it seems to be that way:

svendegroote@worker-gpu-0:~$ apt search linux-headers | grep installed

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

linux-headers-5.3.0-1029-gcp/bionic-updates,bionic-security,now 5.3.0-1029.31~18.04.1 amd64 [installed]
linux-headers-gcp/bionic-updates,bionic-security,now 5.3.0.1029.23 amd64 [installed]

Information on the image from google cloud console:

image

Any idea why the script does not seem to find the installed linux headers?

(I also logged this issue on the Kubespray repo: kubernetes-sigs/kubespray#6340)