rpm-otreeed.service missing configuration proxy
zhouhao3 opened this issue · 1 comments
Description
We use openshift 4.12.0-ec.4
for IPI deployment. The Master can be successfully deployed, but the crio.service
on master fails to start, this caused the IPI deployment to fail.The reasons are as follows:
Our investigation found that the reason for the failure of crio.service
was that the machine-config-daemon-firstboot.service
it depends on failed. The relevant information is as follows:
● machine-config-daemon-firstboot.service - Machine Config Daemon Firstboot
Loaded: loaded (/etc/systemd/system/machine-config-daemon-firstboot.service; enabled; vendor preset: enabled)
Active: activating (start) since Mon 2022-10-17 09:20:48 UTC; 17h ago
Main PID: 3674 (machine-config-)
Tasks: 35 (limit: 406926)
Memory: 47.1M
CPU: 4min 20.173s
CGroup: /system.slice/machine-config-daemon-firstboot.service
└─3674 /run/bin/machine-config-daemon firstboot-complete-machineconfig
Oct 18 03:03:52 master-1 machine-config-daemon[3674]: I1018 03:03:52.904189 3674 rpm-ostree.go:447] Running captured: rpm-ostree --version
Oct 18 03:03:52 master-1 machine-config-daemon[3674]: I1018 03:03:52.929770 3674 rpm-ostree.go:407] Executing rebase to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661
Oct 18 03:03:52 master-1 machine-config-daemon[3674]: I1018 03:03:52.929786 3674 update.go:2053] Running: rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661
Oct 18 03:03:52 master-1 machine-config-daemon[61425]: Pulling manifest: ostree-unverified-image:docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661
Oct 18 03:03:53 master-1 machine-config-daemon[3674]: I1018 03:03:53.074957 3674 update.go:1243] Updating files
Oct 18 03:03:53 master-1 machine-config-daemon[3674]: I1018 03:03:53.074977 3674 update.go:1308] Deleting stale data
Oct 18 03:03:53 master-1 machine-config-daemon[3674]: I1018 03:03:53.074985 3674 update.go:2098] Removing SIGTERM protection
Oct 18 03:03:53 master-1 machine-config-daemon[3674]: W1018 03:03:53.074994 3674 firstboot_complete_machineconfig.go:46] error: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661 : **error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661: error: remote error: pinging container registry quay.io: Get "[https://quay.io/v2/":](https://quay.io/v2/%22:) dial tcp: lookup quay.io on 192.168.30.1:53: no such host**
Oct 18 03:03:53 master-1 machine-config-daemon[3674]: : exit status 1
Oct 18 03:03:53 master-1 machine-config-daemon[3674]: I1018 03:03:53.075000 3674 firstboot_complete_machineconfig.go:47] Sleeping 1 minute for retry
We can see that the reason for the error is the rpm-ostree
not configuring the proxy when executing the rebase command.
We tried manually configuring the proxy for rpm-ostree
and it worked.
In addition, we found that in the normal version (4.11.1), machine-config-daemon-firstboot.service
will not execute the rpm-ostree rebase
command. The specific information is as follows:
● machine-config-daemon-firstboot.service - Machine Config Daemon Firstboot
Loaded: loaded (/etc/systemd/system/machine-config-daemon-firstboot.service; enabled; vendor preset: enabled)
Active: activating (start) since Tue 2022-10-18 07:21:02 UTC; 1min 25s ago
Main PID: 3825 (machine-config-)
Tasks: 54 (limit: 406941)
Memory: 514.7M
CPU: 14.980s
CGroup: /system.slice/machine-config-daemon-firstboot.service
├─3825 /run/bin/machine-config-daemon firstboot-complete-machineconfig
└─3972 oc image extract --path /:/run/mco-machine-os-content/os-content-41001907 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6e7c8e9e407ebab51eac2482d13c07d071c0be1a5755a36a64f0be1b73b3999a
Oct 18 07:21:02 master-0 machine-config-daemon[3825]: I1018 07:21:02.303454 3825 update.go:1976] Running: systemctl start rpm-ostreed
Oct 18 07:21:02 master-0 machine-config-daemon[3825]: I1018 07:21:02.459377 3825 rpm-ostree.go:324] Running captured: rpm-ostree status --json
Oct 18 07:21:02 master-0 machine-config-daemon[3825]: I1018 07:21:02.518682 3825 rpm-ostree.go:324] Running captured: rpm-ostree status --json
Oct 18 07:21:02 master-0 machine-config-daemon[3825]: I1018 07:21:02.562059 3825 daemon.go:236] Booted osImageURL: (411.86.202207150124-0)
Oct 18 07:21:02 master-0 machine-config-daemon[3825]: I1018 07:21:02.563609 3825 update.go:2013] Adding SIGTERM protection
Oct 18 07:21:02 master-0 machine-config-daemon[3825]: I1018 07:21:02.564265 3825 update.go:513] Checking Reconcilable for config mco-empty-mc to rendered-master-e08a90a8cf8f7f4f823348adf310f481
Oct 18 07:21:02 master-0 machine-config-daemon[3825]: I1018 07:21:02.565941 3825 update.go:1991] Starting update from mco-empty-mc to rendered-master-e08a90a8cf8f7f4f823348adf310f481: &{osUpdate:true kargs:false fips:false passwd:false files:false units:false kernelType:false extensions:false}
Oct 18 07:21:02 master-0 machine-config-daemon[3825]: I1018 07:21:02.570660 3825 update.go:1207] Updating files
Oct 18 07:21:02 master-0 machine-config-daemon[3825]: I1018 07:21:02.570677 3825 update.go:1272] Deleting stale data
Oct 18 07:21:02 master-0 machine-config-daemon[3825]: I1018 07:21:02.570802 3825 run.go:19] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-41001907 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6e7c8e9e407ebab51eac2482d13c07d071c0be1a5755a36a64f0be1b73b3999a
Therefore, we think that the rpm-ostree rebase
command should be executed from a certain version of machine-config-daemon-firstboot.service
, but the corresponding proxy configuration has not been added, which caused the problem.
Steps to reproduce the issue:
- openshift-baremetal-install --dir ~/clusterconfigs create manifests
- openshift-baremetal-install --dir ~/clusterconfigs --log-level debug create cluster
Describe the results you received:
DEBUG Log bundle written to /var/home/core/log-bundle-20221012071722.tar.gz
WARNING Unable to stat /home/kni/clusterconfigs/serial-log-bundle-20221012071722.tar.gz, skipping
ERROR Bootstrap failed to complete: timed out waiting for the condition
ERROR Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.
INFO Bootstrap gather logs captured here "/home/kni/clusterconfigs/log-bundle-20221012071722.tar.gz"
Describe the results you expected:
Additional information you deem important (e.g. issue happens only occasionally):
machine-config-operator image info:
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d1650adb41efbe7287997152c74850a410ba0a5eb2d3ab9c7723d144e7985de5
See issue 6482 for more details.
Output of oc adm release info --commits | grep machine-config-operator
:
(paste your output here)
Additional environment details (platform, options, etc.):