Cloud-init fails for ubuntu 20.04 base AMI and Cloud-init version '23.3.1-0ubuntu1~20.04.1'
supershal opened this issue · 9 comments
What steps did you take and what happened:
The latest cloud-init version 23.3.1-0ubuntu1~20.04.1
that is shipped with base AMI for Ubuntu 20.04 is unable to run boothook https://cloudinit.readthedocs.io/en/latest/explanation/format.html#cloud-boothook provided by CAPA, https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/0bf78b04b305a77aec37a68c107102231faa7a16/pkg/cloud/services/secretsmanager/secret_fetch_script.go#L20
As a result the CAPA VMs are not initializing as expected.
Steps to reproduce:
- create an AMI using image-builder
make build-ami-ubuntu-2004
-
Create CAPA cluster using the AMI created in step 1 using instructions at: https://cluster-api-aws.sigs.k8s.io/getting-started.html
-
Check logs at
/var/log/cloud-init-output.log
What did you expect to happen:
Cloud-init run successfully on the VM
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Log from cloud-init.
2023-10-24 18:53:21] 2023-10-24 18:53:21,892 - util.py[WARNING]: failed stage init
[2023-10-24 18:53:21] failed run of stage init
[2023-10-24 18:53:21] ------------------------------------------------------------
[2023-10-24 18:53:21] Traceback (most recent call last):
[2023-10-24 18:53:21] File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 78, in read_file_or_url
[2023-10-24 18:53:21] with open(file_path, "rb") as fp:
[2023-10-24 18:53:21] FileNotFoundError: [Errno 2] No such file or directory: '/etc/secret-userdata.txt'
[2023-10-24 18:53:21]
[2023-10-24 18:53:21] The above exception was the direct cause of the following exception:
[2023-10-24 18:53:21]
[2023-10-24 18:53:21] Traceback (most recent call last):
[2023-10-24 18:53:21] File "/usr/lib/python3/dist-packages/cloudinit/user_data.py", line 238, in _do_include
[2023-10-24 18:53:21] resp = read_file_or_url(
[2023-10-24 18:53:21] File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 84, in read_file_or_url
[2023-10-24 18:53:21] raise UrlError(cause=e, code=code, headers=None, url=url) from e
[2023-10-24 18:53:21] cloudinit.url_helper.UrlError: [Errno 2] No such file or directory: '/etc/secret-userdata.txt'
[2023-10-24 18:53:21]
[2023-10-24 18:53:21] The above exception was the direct cause of the following exception:
[2023-10-24 18:53:21]
[2023-10-24 18:53:21] Traceback (most recent call last):
[2023-10-24 18:53:21] File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 766, in status_wrapper
[2023-10-24 18:53:21] ret = functor(name, args)
[2023-10-24 18:53:21] File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 453, in main_init
[2023-10-24 18:53:21] init.update()
[2023-10-24 18:53:21] File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 484, in update
[2023-10-24 18:53:21] self._store_processeddata(self.datasource.get_userdata(), "userdata")
[2023-10-24 18:53:21] File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 599, in get_userdata
[2023-10-24 18:53:21] self.userdata = self.ud_proc.process(self.get_userdata_raw())
[2023-10-24 18:53:21] File "/usr/lib/python3/dist-packages/cloudinit/user_data.py", line 88, in process
[2023-10-24 18:53:21] self._process_msg(convert_string(blob), accumulating_msg)
[2023-10-24 18:53:21] File "/usr/lib/python3/dist-packages/cloudinit/user_data.py", line 159, in _process_msg
[2023-10-24 18:53:21] self._do_include(payload, append_msg)
[2023-10-24 18:53:21] File "/usr/lib/python3/dist-packages/cloudinit/user_data.py", line 264, in _do_include
[2023-10-24 18:53:21] _handle_error(message, urle)
[2023-10-24 18:53:21] File "/usr/lib/python3/dist-packages/cloudinit/user_data.py", line 72, in _handle_error
[2023-10-24 18:53:21] raise RuntimeError(error_message) from source_exception
[2023-10-24 18:53:21] RuntimeError: [Errno 2] No such file or directory: '/etc/secret-userdata.txt' for url: file:///etc/secret-userdata.txt
[2023-10-24 18:53:21] ------------------------------------------------------------
[2023-10-24 18:53:40] Cloud-init v. 23.3.1-0ubuntu1~20.04.1 running 'modules:config' at Tue, 24 Oct 2023 18:53:37 +0000. Up 42.69 seconds.
[2023-10-24 18:53:40] Cloud-init v. 23.3.1-0ubuntu1~20.04.1 running 'modules:final' at Tue, 24 Oct 2023 18:53:40 +0000. Up 46.25 seconds.
[2023-10-24 18:53:40] Cloud-init v. 23.3.1-0ubuntu1~20.04.1 finished at Tue, 24 Oct 2023 18:53:40 +0000. Datasource DataSourceEc2Local. Up 46.42 second
Environment:
Project (Image Builder for Cluster API:
Additional info for Image Builder for Cluster API related issues:
- OS (e.g. from
/etc/os-release
, orcmd /c ver
): ubuntu-20.04 - Packer Version:
- Packer Provider:
- Ansible Version:
- Cluster-api version (if using):
- Kubernetes version: (use
kubectl version
):
/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]
we were able to downgrade the cloud-init to 23.2.1-0ubuntu0~20.04.2
and create cluster successfully. mesosphere/konvoy-image-builder#938
cc: @voor @cnmcavoy
We are still not sure of the root cause and change in cloud-init that resulted in this issue.
I was able to provide following override file to the image-builder and build AMI that can run CAPA cloud-init script successfully.
pin-cloud-init-override.json :
{
"ansible_extra_vars": "pinned_debs=\"cloud-init=23.1.2-0ubuntu0~20.04.2\""
}
I built the image using following makefile target of image-builder
make build-ami-ubuntu-2004 PACKER_VAR_FILES=pin-cloud-init-override.json
We will have to now investigate what changes in 23.3.1-0ubuntu1~20.04.1
broke the CAPA cloud-init script.
Moving over some comments from slack so they're not lost in the sands of time:
- AWS mirrors do not seem to be keeping all versions of cloud-init consistently, so needed to download the debian package from elsewhere and host it.
- Pinning the version seems to resolve the issue
- This might be related to #406 which historically caused issues with CAPA.
- name: Downgrade cloud init.
apt:
deb: http://launchpadlibrarian.net/679992659/cloud-init_23.2.2-0ubuntu0~20.04.1_all.deb
state: present
force: true
- name: Pin cloud init to prevent version issues.
dpkg_selections:
name: "{{ item }}"
selection: hold
loop:
- cloud-init
For image-builder users who have hit this bug and are reading this issue:
We believe the root cause to be in cloud-init, and would like to fix it there (see canonical/cloud-init#4572). We prefer to do this to the alternative, which is to "pin" an older, known-good cloud-init version in image-builder itself.
For now, if you use image-builder to create an Ubuntu 20.04 AMI, please use the workaround described in #1333 (comment).
This might be related to #406 which historically caused issues with CAPA.
@supershal and I found that the feature override mechanism used in #406 does not work in the recent versions of cloud-init in Ubuntu 20.04. This mechanism was removed from cloud-init in canonical/cloud-init#4228.
Patching cloud-init is the officially documented mechanism now:
Currently used upstream values for feature flags are set in
cloudinit/features.py
. Overrides to these values should be patched directly (e.g., via quilt patch) by downstreams.
I guess modifying the cloud-init python module to set ERROR_ON_USER_DATA_FAILURE = False
is something image-builder can do for now. But once Ubuntu 20.04 is EOL, the feature flag itself will be removed.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.