dockerd keeps outputting logs of `unknown output format`
siaimes opened this issue · 11 comments
● docker.service - Docker Application Container Engine
Loaded: loaded (/etc/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─docker-dns.conf, docker-options.conf
Active: active (running) since Sat 2022-05-14 12:21:48 UTC; 1 weeks 3 days ago
Docs: http://docs.docker.com
Main PID: 2191 (dockerd)
Tasks: 0
CGroup: /system.slice/docker.service
└─2191 /usr/bin/dockerd --data-root=/mnt/docker --log-opt max-size=2g --log-opt max-file=2 --log-driver=json-file --iptables=false --data-root=/mnt/docker --log-opt max-size=2g --log-opt max-file=2 --log-driver=json-file --dns 10.192.0.3 --dns 127.0.0.53 --dns-search default.svc.cluster.local --dns-search svc.cluster.local --dns-opt ndots:2 --dns-opt timeout:2 --dns-opt attempts:2
May 25 08:13:04 ubuntu dockerd[2191]: time="2022-05-25T08:13:04.657183109Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:13:34 ubuntu dockerd[2191]: time="2022-05-25T08:13:34.717524328Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:14:04 ubuntu dockerd[2191]: time="2022-05-25T08:14:04.782635726Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:14:34 ubuntu dockerd[2191]: time="2022-05-25T08:14:34.848038536Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:15:04 ubuntu dockerd[2191]: time="2022-05-25T08:15:04.904724411Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:15:34 ubuntu dockerd[2191]: time="2022-05-25T08:15:34.966260670Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:16:05 ubuntu dockerd[2191]: time="2022-05-25T08:16:05.036384305Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:16:35 ubuntu dockerd[2191]: time="2022-05-25T08:16:35.102791653Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:17:05 ubuntu dockerd[2191]: time="2022-05-25T08:17:05.163632028Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
May 25 08:17:35 ubuntu dockerd[2191]: time="2022-05-25T08:17:35.225188923Z" level=warning msg="failed to retrieve /usr/bin/nvidia-container-runtime version: unknown output format: runc version 1.0.2\ncommit: v1.0.2-0-g52b36a2\nspec: 1.0.2-dev\ngo: go1.16.10\nlibseccomp: 2.5.1\n"
This seems related to moby/moby#38709
and could be caused by the additional output from the nvidia-container-runtime --version
command.
For reference, dockerd seems to be generating the error here https://github.com/moby/moby/blob/6b9b445af6c7908992632cff6c30cbf6a4c617ac/daemon/info_unix.go#L352
ubuntu@ubuntu:~$ nvidia-container-runtime --version
runc version 1.0.2
commit: v1.0.2-0-g52b36a2
spec: 1.0.2-dev
go: go1.16.10
libseccomp: 2.5.1
@siaimes which version of the
nvidia-container-toolkit
are you using?
As recommended here, I installed nvidia-container-runtime
directly.
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt update
sudo apt install nvidia-container-runtime
sudo nano /etc/docker/daemon.json
For reference, dockerd seems to be generating the error here https://github.com/moby/moby/blob/6b9b445af6c7908992632cff6c30cbf6a4c617ac/daemon/info_unix.go#L352
So this log actually has no effect, but it keeps outputting, so other logs cannot be retained.
@siaimes thanks for checking the version. I assume that you also have nvidia-container-runtime
set as your default runtime in your /etc/docker/daemon.json
file? Since the output of nvidia-container-runtime --version
is identical to that of runc --version
, I would expect that this message would be the same even if this was not the case.
Would you be able to confirm whether this is the case?
Also for clarification. What is the output of apt list nvidia-container-toolkit nvidia-container-runtime
?
@siaimes thanks for checking the version. I assume that you also have
nvidia-container-runtime
set as your default runtime in your/etc/docker/daemon.json
file? Since the output ofnvidia-container-runtime --version
is identical to that ofrunc --version
, I would expect that this message would be the same even if this was not the case.Would you be able to confirm whether this is the case?
Also for clarification. What is the output of
apt list nvidia-container-toolkit nvidia-container-runtime
?
Yes, I use k8s to run deep learning jobs, so I set nvidia-container-runtime
to default.
ubuntu@ubuntu:~$ sudo apt list nvidia-container-toolkit nvidia-container-runtime
Listing... Done
nvidia-container-runtime/bionic 3.9.0-1 all [upgradable from: 3.7.0-1]
nvidia-container-toolkit/bionic 1.9.0-1 amd64 [upgradable from: 1.7.0-1]
@siaimes as a matter of interest, which version of docker are you using?
20.10.7-20.10.12 based on the time when the node was added to my cluster.
We have recently revamped the logging for the NVIDIA Container Toolki.
If this behaviour still presents with the most recent version, open a new issue against https://github.com/NVIDIA/nvidia-container-toolkit.