okd-project/okd

Baremetal on OCI:

Closed this issue · 4 comments

When we create the images to install as Baremetal on OCI, (we embed the ignition file on the ISO and then use that as custom images) Bootstrap works fine, but master nodes keeps o a loop with this error:

Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: W0711 06:26:55.745940 1712 firstboot_complete_machineconfig.go:65] error: failed to remove pending deployment: error running rpm-ostree cleanup -p: error: cleanup: Invoking cleanup: GDBus.Error:org.gtk.GDBus.UnmappedGError.Quark._g_2dio_2derror_2dquark.Code14: Remounting /sysroot read-write: Permission denied
Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: : exit status 1
Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: I0711 06:26:55.745945 1712 firstboot_complete_machineconfig.go:66] Sleeping 1 minute for retry

Version

4.15.0-0.okd-2024-03-10-010116
How reproducible

100%, happens everytime on OCI platform, Agent based installer has a very similar issue

Log bundle

Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com podman[1697]: I0711 06:26:55.745934 1712 update.go:1618] Deleting stale data
Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com podman[1697]: I0711 06:26:55.745936 1712 update.go:2371] Removing SIGTERM protection
Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com podman[1697]: W0711 06:26:55.745940 1712 firstboot_complete_machineconfig.go:65] error: failed to remove pending deployment: error running rpm-ostree cleanup -p: error: cleanup: Invoking cleanup: GDBus.Error:org.gtk.GDBus.UnmappedGError.Quark._g_2dio_2derror_2dquark.Code14: Remounting /sysroot read-write: Permission denied
Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com podman[1697]: : exit status 1
Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com podman[1697]: I0711 06:26:55.745945 1712 firstboot_complete_machineconfig.go:66] Sleeping 1 minute for retry
Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: I0711 06:26:55.745934 1712 update.go:1618] Deleting stale data
Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: I0711 06:26:55.745936 1712 update.go:2371] Removing SIGTERM protection
Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: W0711 06:26:55.745940 1712 firstboot_complete_machineconfig.go:65] error: failed to remove pending deployment: error running rpm-ostree cleanup -p: error: cleanup: Invoking cleanup: GDBus.Error:org.gtk.GDBus.UnmappedGError.Quark._g_2dio_2derror_2dquark.Code14: Remounting /sysroot read-write: Permission denied
Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: : exit status 1
Jul 11 06:26:55 ddmstr01.sndamdocsv360.vcndesarrollo.oraclevcn.com epic_khorana[1710]: I0711 06:26:55.745945 1712 firstboot_complete_machineconfig.go:66] Sleeping 1 minute for retry

I had a similar issue, although on openstack.

The main issue was that /sysroot was mounted on loopback device that was limited to read only.

I realized that FCOS live iso had this setup written on the image, and I used the wrong FCOS image.

Using the openstack release of FCOS, I can tell the mounts were setup differently.

Just putting my 2 cents, Maybe you're not using the metal release of FCOS?

Hi, I used the same way as you and got the similar error, I'm not sure ... may be related to the destination you install. In my case, I installed Openshift cluster on the OCP-Virtualization VM. I fixed by attaching the VM with the base iso (not injected), when you first boot the OS, you can run the command to install with ignition file, assume I want install on /dev/vda:
sudo coreos-installer install /dev/vda --ignition-url http://192.168.30.17/openshift4//ignitions/bootstrap.ign --insecure-ignition

Do with each node, reboot , after that the cluster was installed successfully

@LamNguy thanks I'll try to go baremetal
@DoodlesOnMyFood you make me notice also I was using the live image, I'm going to give a try with the openshif-installer images command to see the right one

Hi,

We are not working on FCOS builds of OKD any more. Please see these documents...

https://okd.io/blog/2024/06/01/okd-future-statement
https://okd.io/blog/2024/07/30/okd-pre-release-testing

We will be providing documentation on upgrading clusters from 4.15 FCOS to 4.16 SCOS. In the meantime, you may be able to get help from community members. I'll convert this to a discussion to facilitate that.

Many thanks,

Jaime