NVIDIA/nvidia-container-runtime

dockerd keeps outputting logs of `unknown output format`

siaimes opened this issue · 11 comments

● docker.service - Docker Application Container Engine
   Loaded: loaded (/etc/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─docker-dns.conf, docker-options.conf
   Active: active (running) since Sat 2022-05-14 12:21:48 UTC; 1 weeks 3 days ago
     Docs: http://docs.docker.com
 Main PID: 2191 (dockerd)
    Tasks: 0
   CGroup: /system.slice/docker.service
           └─2191 /usr/bin/dockerd --data-root=/mnt/docker --log-opt max-size=2g --log-opt max-file=2 --log-driver=json-file --iptables=false --data-root=/mnt/docker --log-opt max-size=2g --log-opt max-file=2 --log-driver=json-file --dns 10.192.0.3 --dns 127.0.0.53 --dns-search default.svc.cluster.local --dns-search svc.cluster.local --dns-opt ndots:2 --dns-opt timeout:2 --dns-opt attempts:2

May 25 08:13:04 ubuntu dockerd[2191]: time="2022-05-25T08:13:04.657183109Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:13:34 ubuntu dockerd[2191]: time="2022-05-25T08:13:34.717524328Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:14:04 ubuntu dockerd[2191]: time="2022-05-25T08:14:04.782635726Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:14:34 ubuntu dockerd[2191]: time="2022-05-25T08:14:34.848038536Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:15:04 ubuntu dockerd[2191]: time="2022-05-25T08:15:04.904724411Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:15:34 ubuntu dockerd[2191]: time="2022-05-25T08:15:34.966260670Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:16:05 ubuntu dockerd[2191]: time="2022-05-25T08:16:05.036384305Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:16:35 ubuntu dockerd[2191]: time="2022-05-25T08:16:35.102791653Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:17:05 ubuntu dockerd[2191]: time="2022-05-25T08:17:05.163632028Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:17:35 ubuntu dockerd[2191]: time="2022-05-25T08:17:35.225188923Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"

@siaimes which version of the nvidia-container-toolkit are you using?

This seems related to moby/moby#38709

and could be caused by the additional output from the nvidia-container-runtime --version command.

ubuntu@ubuntu:~$ nvidia-container-runtime --version
runc version 1.0.2
commit: v1.0.2-0-g52b36a2
spec: 1.0.2-dev
go: go1.16.10
libseccomp: 2.5.1

@siaimes which version of the nvidia-container-toolkit are you using?

As recommended here, I installed nvidia-container-runtime directly.

curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt update
sudo apt install nvidia-container-runtime
sudo nano /etc/docker/daemon.json

For reference, dockerd seems to be generating the error here https://github.com/moby/moby/blob/6b9b445af6c7908992632cff6c30cbf6a4c617ac/daemon/info_unix.go#L352

So this log actually has no effect, but it keeps outputting, so other logs cannot be retained.

@siaimes thanks for checking the version. I assume that you also have nvidia-container-runtime set as your default runtime in your /etc/docker/daemon.json file? Since the output of nvidia-container-runtime --version is identical to that of runc --version, I would expect that this message would be the same even if this was not the case.

Would you be able to confirm whether this is the case?

Also for clarification. What is the output of apt list nvidia-container-toolkit nvidia-container-runtime?

@siaimes thanks for checking the version. I assume that you also have nvidia-container-runtime set as your default runtime in your /etc/docker/daemon.json file? Since the output of nvidia-container-runtime --version is identical to that of runc --version, I would expect that this message would be the same even if this was not the case.

Would you be able to confirm whether this is the case?

Also for clarification. What is the output of apt list nvidia-container-toolkit nvidia-container-runtime?

Yes, I use k8s to run deep learning jobs, so I set nvidia-container-runtime to default.

ubuntu@ubuntu:~$ sudo apt list nvidia-container-toolkit nvidia-container-runtime
Listing... Done
nvidia-container-runtime/bionic 3.9.0-1 all [upgradable from: 3.7.0-1]
nvidia-container-toolkit/bionic 1.9.0-1 amd64 [upgradable from: 1.7.0-1]

@siaimes as a matter of interest, which version of docker are you using?

@siaimes as a matter of interest, which version of docker are you using?

20.10.7-20.10.12 based on the time when the node was added to my cluster.

elezar commented

We have recently revamped the logging for the NVIDIA Container Toolki.

If this behaviour still presents with the most recent version, open a new issue against https://github.com/NVIDIA/nvidia-container-toolkit.