kubernetes-sigs/image-builder

[ova] ubuntu22 won't build on vsphere 7.x

staerion opened this issue · 9 comments

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]

  1. Clone the image-builder repository with tag 0.1.19
  2. Go to the folder image-builder/images/capi
  3. Configure the vsphere variables under image-builder/images/capi/packer/ova/vsphere.json
  4. Run the command "make build-node-ova-vsphere-base-ubuntu-2204-efi"

What did you expect to happen:
Succesfully provision and configure an ubuntu22 base image to vsphere

Anything else you would like to add:
Vsphere version: vSphere version 7.0.3.01400

The issue appears to be with the vsphere-iso plugin configured in image-builder/images/capi/packer/ova/packer-node.json. The "floppy_dirs" option is not supported when connecting to a vsphere 7.x instance. When provisoning the build-node-ova-vsphere-base-ubuntu-2004 build, I can also see an error in the logs that it can't find the fd0 device. The build itself however still continues. For the ubuntu22 build it just hangs. See the corresponding logs below.

Logs from packer. hangs on "Waiting for SSH to become available...

==> vsphere-iso.vsphere-iso-base: Creating VM...
==> vsphere-iso.vsphere-iso-base: Customizing hardware...
==> vsphere-iso.vsphere-iso-base: Mounting ISO images...
==> vsphere-iso.vsphere-iso-base: Adding configuration parameters...
==> vsphere-iso.vsphere-iso-base: Creating floppy disk...
    vsphere-iso.vsphere-iso-base: Copying files flatly from floppy_files
    vsphere-iso.vsphere-iso-base: Done copying files from floppy_files
    vsphere-iso.vsphere-iso-base: Collecting paths from floppy_dirs
    vsphere-iso.vsphere-iso-base: Resulting paths from floppy_dirs : [./packer/ova/linux/ubuntu/http/]
    vsphere-iso.vsphere-iso-base: Recursively copying : ./packer/ova/linux/ubuntu/http/
    vsphere-iso.vsphere-iso-base: Done copying paths from floppy_dirs
    vsphere-iso.vsphere-iso-base: Copying files from floppy_content
    vsphere-iso.vsphere-iso-base: Done copying files from floppy_content
==> vsphere-iso.vsphere-iso-base: Uploading created floppy image
==> vsphere-iso.vsphere-iso-base: Adding generated Floppy...
==> vsphere-iso.vsphere-iso-base: Starting HTTP server on port 8810
==> vsphere-iso.vsphere-iso-base: Set boot order temporary...
==> vsphere-iso.vsphere-iso-base: Power on VM...
==> vsphere-iso.vsphere-iso-base: Waiting 10s for boot...
==> vsphere-iso.vsphere-iso-base: HTTP server is working at http://172.30.67.45:8810/
==> vsphere-iso.vsphere-iso-base: Typing boot command...
==> vsphere-iso.vsphere-iso-base: Waiting for IP...
==> vsphere-iso.vsphere-iso-base: IP address: 10.x.x.x
==> vsphere-iso.vsphere-iso-base: Using SSH communicator to connect: 10..x.x.x
==> vsphere-iso.vsphere-iso-base: Waiting for SSH to become available...

Logs from the vm in vsphere, hangs on i/o errors on the fd0 device

[OK ] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
[ 14.959277] blk_update_request: I/O error, dev fd0, sector 0 op 0x0: (READ) flags 0x80700 phys_seg 1 prio class 0 
[ 14.962594] blk_update_request: I/O error, dev fd0, sector 0 op 0x0: (READ) flags 0x0 phys_seg 1 prio class 0 
[ 14.965620] Buffer I/O error on dev fdo, logical block 0, async page read
[ 14.984666] blk_update_request: I/O error, dev fdo, sector 0 op 0x0: (READ) flags 0x80700 phys_seg 1 prio class 0 
[ 14.986728] blk_update_request: I/O error, dev fd0, sector 0 op 0x0: (READ) flags 0x0 phys_seg 1 prio class 0
[ 14.988502] Buffer I/O error on dev fd0, logical block 0, async page read
[ 15.000604] blk_update_request: I/O error, dev fdo, sector 0 op 0x0: (READ) flags 0x80700 phys_seg 1 prio class 0 
[ 15.001887] blk_update_request: I/O error, dev fd0, sector 0 op 0x0: (READ) flags 0x0 phys_seg 1 prio class 0 
[ 15.002900] Buffer I/O error on dev fd0, logical block 0, async page read
[ 15.016799] blk_update_request: I/O error, dev fd0, sector 0 op 0x0: (READ) flags 0x80700 phys_seg 1 prio class 0
[ 15.018184] blk_update_request: I/O error, dev fdo, sector 0 op 0x0: (READ) flags 0x0 phys_seg 1 prio class 0 
[ 15.019401] Buffer I/O error on dev fdo, logical block 0, async page read
[ 15.027730] blk_update_request: I/O error, dev fd0, sector 0 op 0x0: (READ) flags 0x80700 phys_seg 1 prio class 0 
[ 15.028765] blk_update_request: I/O error, dev fd0, sector 0 op 0x0: (READ) flags 0x0 phys_seg 1 prio class 0 
[ 15.030948] Buffer I/O error on dev fdo, logical block 0, async page read
[ OK ] Finished Initial cloud-init job (pre-networking).
[ OK ] Reached target Preparation for Network. Starting Network Configuration...
Started Network Configuration.
[ OK ] Started Network Configuration.
Starting Wait for Network to be Configured... 
Starting Network Name Resolution...
[ OK ] Finished Wait for Network to be Configured.
Starting Initial cloud-init job (metadata service crawler)...
[ OK ] Started Network Name Resolution.
[ OK ] Reached target Network.
[OK ] Reached target Host and Network Name Lookups.
[ 15.974753] Buffer I/O error on dev fdo, logical block 0, async page read
[ 15.984148] Buffer I/O error on dev fd0, logical block 0, async page read
[ 15.993372] Buffer I/O error on dev fdo, logical block 0, async page read
[ 16.002417] Buffer I/O error on dev fd0, logical block 0, async page read
[ 16.014521] Buffer I/O error on dev fd0, logical block 0, async page read

A relatively easy solution for my use case was to use cd_files instead of the floppy_dirs:

git diff
diff --git a/images/capi/packer/ova/packer-node.json b/images/capi/packer/ova/packer-node.json
index 64eb272cb..fa0117a67 100644
--- a/images/capi/packer/ova/packer-node.json
+++ b/images/capi/packer/ova/packer-node.json
@@ -150,7 +150,11 @@
       "destroy": "{{user `destroy`}}",
       "disk_controller_type": "{{user `disk_controller_type`}}",
       "firmware": "{{user `firmware`}}",
-      "floppy_dirs": "{{ user `floppy_dirs`}}",
+      "cd_label": "cidata",
+      "cd_files": [
+        "{{user `cd_file_user` }}",
+        "{{ user `cd_file_meta` }}"
+      ],
       "folder": "{{user `folder`}}",
       "guest_os_type": "{{user `vsphere_guest_os_type`}}",
       "host": "{{user `host`}}",

My workaround.json var file looked like:

{
    "boot_command_prefix": "c<wait>linux /casper/vmlinuz ipv6.disable={{ user `boot_disable_ipv6` }} --- autoinstall ds='nocloud'<enter><wait>initrd /casper/initrd<enter><wait>boot<enter>",
    "cd_file_user": "./packer/ova/linux/{{user `distro_name`}}/http/22.04.efi/user-data",
    "cd_file_meta": "./packer/ova/linux/{{user `distro_name`}}/http/22.04.efi/meta-data"
}

Environment:

Project (Image Builder for Cluster API:

Additional info for Image Builder for Cluster API related issues:

  • OS (e.g. from /etc/os-release, or cmd /c ver): Ubuntu 20.04.6 LTS (Focal Fossa)
  • Packer Version: 1.9.2
  • Packer Provider: vsphere-iso
  • Ansible Version: core 2.15.3
  • Cluster-api version (if using): N/A
  • Kubernetes version: (use kubectl version): N/A

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

I get the same scenarion, but this workaround isn't working for me.
The logs show that cloud-init is starting then then process is stuck at the "choose your language" screen from the Ubuntu setup.

  • Running in container based on ubuntu:22.04
  • Packer version: 1.9.5
  • Ansible core: 2.11.5 (from ppa:ansible/ansible)
  • Python 3.8.10
  • vSphere Client version 7.0.3.01700

Just to be sure I presume you do export the workaround.json to the PACKER_VAR_FILES variable? Do you still get device errors in te logs? I used to be stuck on the language section of the ubuntu image as well when the cloud-init could not be read from the cd drive

On my side, I'm using Ubuntu cloud images and I've added user_data through vApps.
The patch I'm using (on image-build v0.1.21):

287a288,292
>       "vapp": {
>         "properties": {
>           "user-data": "{{user `user_data`}}"
>         }
>       },
509a515
>     "user_data": null,

Works like a charm.

  1. (to @staerion) Yes, i've added the workaround.json as a packer-var and can see it referenced in the log

  2. (to @fad3t) Could you please elaborate (i.e share some config) on what you did to use the cloud-image, and what changes you've done

@ninlil I applied the above patch to packer-node.json.

Then I redefined the ubuntu22 JSON as follows:

{
    "build_name": "ubuntu-2204",
    "build_version": "{{user `build_name`}}-kube-{{user `kubernetes_semver`}}",
    "distro_arch": "amd64",
    "distro_name": "ubuntu",
    "distro_version": "22.04",
    "guest_os_type": "ubuntu-64",
    "image_url": "https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.ova",
    "linked_clone": "false",
    "os_display_name": "Ubuntu 22.04",
    "shutdown_command": "shutdown -P now",
    "template": "jammy-server-cloudimg-amd64",
    "user_data": "I2Nsb3VkLWNvbmZpZwp1c2VyczoKICAtIG5hbWU6IGJ1aWxkZXIKICAgIHN1ZG86IFsnQUxMPShBTEwpIE5PUEFTU1dEOkFMTCddCmNocGFzc3dkOgogIGxpc3Q6IHwKICAgIGJ1aWxkZXI6YnVpbGRlcgogIGV4cGlyZTogRmFsc2UKc3NoX3B3YXV0aDogVHJ1ZQ==",
    "vsphere_guest_os_type": "ubuntu64Guest"
}

I also have a small JSON describing my cluster:

{
    "cluster": "abc123",
    "convert_to_template": "true",
    "create_snapshot": "true",
    "datacenter": "my-datacenter",
    "datastore": "my-datastore",
    "folder": "templates",
    "insecure_connection": "true",
    "network": "k8s-network",
    "password": "{{env `PASSWORD`}}",
    "resource_pool": "",
    "username": "{{env `USERNAME`}}",
    "vcenter_server": "1.2.3.4",
    "vmx_version": "19"
}

I'm using a simple bash script to import the OVA, using govc. Here's part of the script:

#!/bin/bash

set -e
set -o pipefail

IMAGE_URL="$(jq -r .image_url vsphere/ubuntu-2204.json)"
TEMPLATE="$(jq -r .template vsphere/ubuntu-2204.json)"
NETWORK="$(jq -r .network cluster/abc123.json)"
DATASTORE="$(jq -r .datastore cluster/abc123.json)"
VCENTER_SERVER="$(jq -r .vcenter_server cluster/abc123.json)"
OVERWRITE="${OVERWRITE:-true}"

GOVC_VERSION="${GOVC_VERSION:-v0.30.7}"
export GOVC_INSECURE="true"
export GOVC_URL="https://${VCENTER_SERVER}"
export GOVC_USERNAME="${USERNAME}"
export GOVC_PASSWORD="${PASSWORD}"

check_vars () {
    # make sure username and password are set
    [[ -n "${GOVC_USERNAME}" ]] || { echo "make sure username is set!"; exit 1; }
    [[ -n "${GOVC_PASSWORD}" ]] || { echo "make sure password is set!"; exit 1; }
}

get_govc () {
    # download the govc utility
    echo "downloading govc.."
    curl -sLO "https://github.com/vmware/govmomi/releases/download/${GOVC_VERSION}/govc_Linux_x86_64.tar.gz"
    tar -xzf govc_Linux_x86_64.tar.gz && rm govc_Linux_x86_64.tar.gz
    mv govc ${HOME}/.local/bin/
}

import_ova () {
    # if template exists and we want to overwrite, delete it
    if govc vm.info -g=false ${TEMPLATE}; then
        if [[ "${OVERWRITE}" == true ]]; then
            echo "deleting template.."
            govc vm.destroy ${TEMPLATE}
        else
            echo "template already exists, skipping.."
            return 0
        fi
    fi

    echo "importing OVA.."
    govc import.spec ${IMAGE_URL} | jq --arg NETWORK ${NETWORK} '.NetworkMapping[0].Network=$NETWORK | .DiskProvisioning="thin"' > options.json
    govc import.ova -ds=${DATASTORE} -name=${TEMPLATE} -options=options.json ${IMAGE_URL} && govc vm.upgrade -vm=${TEMPLATE}
}

check_vars
get_govc
import_ova

echo "done."

And finally run make build-node-ova-vsphere-clone-ubuntu-2204 to start building the image.
I wanted to make a PR to support the vApp thingy, but with the current template in JSON it's a bit all or nothing. Hopefully once we switch to HCL we'll have more flexibility.

Hello, none of the wo

I get the same scenarion, but this workaround isn't working for me. The logs show that cloud-init is starting then then process is stuck at the "choose your language" screen from the Ubuntu setup.

  • Running in container based on ubuntu:22.04
  • Packer version: 1.9.5
  • Ansible core: 2.11.5 (from ppa:ansible/ansible)
  • Python 3.8.10
  • vSphere Client version 7.0.3.01700

Same problem for me as well. It gets me to choose language screen and get stuck. Is there any new updates. Please let me know.

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Seeing this with vSphere 8.x as well (without the workaround specified above). Is this an update on this?