kubernetes-sigs/image-builder

Unable to SSH into RHEL AMI generated with image builder

snehala27 opened this issue · 2 comments

What steps did you take and what happened:
Pulled latest master branch and installed all the pre-requisites.
Run make build-ami-rhel-8. AMI gets generated.
Launch instance and open ports for ssh. Couldn't ssh into the instance.

ssh: connect to host ec2-1234.us-east-2.compute.amazonaws.com port 22: Connection refused

What did you expect to happen:
Should be able to SSH to the instance. Looks like the ssh server is not running

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Project (Image Builder for Cluster API):

Additional info for Image Builder for Cluster API related issues:

  • OS (e.g. from /etc/os-release, or cmd /c ver):
  • Packer Version: v1.8.6
  • Packer Provider:
  • Ansible Version: 2.14.3
  • Cluster-api version (if using):
  • Kubernetes version: (use kubectl version):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

This seems to be because the cloud-init on these VM's doesnt get started due to this error

Apr 13 18:47:41 localhost cloud-init[1201]: Traceback (most recent call last):
Apr 13 18:47:41 localhost cloud-init[1201]:  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 570, in _build_master
Apr 13 18:47:41 localhost cloud-init[1201]:    ws.require(__requires__)
Apr 13 18:47:41 localhost cloud-init[1201]:  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 888, in require
Apr 13 18:47:41 localhost cloud-init[1201]:    needed = self.resolve(parse_requirements(requirements))
Apr 13 18:47:41 localhost cloud-init[1201]:  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 779, in resolve
Apr 13 18:47:41 localhost cloud-init[1201]:    raise VersionConflict(dist, req).with_context(dependent_req)
Apr 13 18:47:41 localhost cloud-init[1201]: pkg_resources.ContextualVersionConflict: (urllib3 1.26.15 (/usr/local/lib/python3.6/site-packages), Requirement.parse('urllib3<1.25,>=1.21.1'), {'requests'})
Apr 13 18:47:41 localhost cloud-init[1201]: During handling of the above exception, another exception occurred:
Apr 13 18:47:41 localhost cloud-init[1201]: Traceback (most recent call last):
Apr 13 18:47:41 localhost cloud-init[1201]:  File "/usr/bin/cloud-init", line 6, in <module>
Apr 13 18:47:41 localhost cloud-init[1201]:    from pkg_resources import load_entry_point
Apr 13 18:47:41 localhost cloud-init[1201]:  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3095, in <module>
Apr 13 18:47:41 localhost cloud-init[1201]:    @_call_aside
Apr 13 18:47:41 localhost cloud-init[1201]:  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3079, in _call_aside
Apr 13 18:47:41 localhost cloud-init[1201]:    f(*args, **kwargs)
Apr 13 18:47:41 localhost cloud-init[1201]:  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3108, in _initialize_master_working_set
Apr 13 18:47:41 localhost cloud-init[1201]:    working_set = WorkingSet._build_master()
Apr 13 18:47:41 localhost cloud-init[1201]:  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 572, in _build_master
Apr 13 18:47:41 localhost cloud-init[1201]:    return cls._build_from_requirements(__requires__)
Apr 13 18:47:41 localhost cloud-init[1201]:  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 585, in _build_from_requirements
Apr 13 18:47:41 localhost cloud-init[1201]:    dists = ws.resolve(reqs, Environment())
Apr 13 18:47:41 localhost cloud-init[1201]:  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 774, in resolve
Apr 13 18:47:41 localhost cloud-init[1201]:    raise DistributionNotFound(req, requirers)
Apr 13 18:47:41 localhost cloud-init[1201]: pkg_resources.DistributionNotFound: The 'urllib3<1.25,>=1.21.1' distribution was not found and is required by requests
Apr 13 18:47:41 localhost systemd[1]: cloud-final.service: Main process exited, code=exited, status=1/FAILURE
Apr 13 18:47:41 localhost systemd[1]: cloud-final.service: Failed with result 'exit-code'.

Its likely happening because image builder is installing awscli via pip which brings in urllib 1.26 under "/usr/local/lib/python3.6/site-packages/" and that is conflicting with cloud-init.

Can we move to using awscliv2 since that seems to be a self contained set of binaries without python deps or is that not compatible with the capi provider for aws ?

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale