Extensions enablement leave the node in a state where `rpm-ostree install` doesn't work.
fidencio opened this issue · 6 comments
Description
After a package gets layered due to the "Extensions" mechanism, manually rpm-ostree install
ing another package becomes impossible.
Steps to reproduce the issue:
- Deploy an operator that forces a package to be installed via the extension mechanism (I deployed the sandboxed-containers operator)
- Connect to a worker node via your preferred method
- Add a
.repo
file to/etc/yum.repos.d/
- Try to install a package from that repo
Describe the results you received:
You'll get a message like:
sh-4.4# rpm-ostree install $package
Checking out tree 6bc3ddd... done
Enabled rpm-md repositories: $package-repo
rpm-md repo '$package-repo' (cached); generated: 2021-11-23T05:13:43Z
Importing rpm-md... done
error: Packages not found: kata-containers
Describe the results you expected:
Be able to rpm-ostree install $package
without issues (assuming all its dependencies are provided ;-))
Additional information you deem important (e.g. issue happens only occasionally):
Happens 100% of the time.
Output of oc adm release info --commits | grep machine-config-operator
:
$ oc adm release info --commits | grep machine-config-operator
machine-config-operator https://github.com/openshift/machine-config-operator a1e3bbfc59d48997e888727fac9ac227bb32732
Additional environment details (platform, options, etc.):
The issue seems to happen as the repo used for installing the packages provided by the extension is simply removed after the extensions' packages are installed.
I had a quick chat with @cgwalters on #fedora-coreos channel, which led me to open the issue here, and here's the transcription of the chat:
19:59 < fidencio> so, first of all, I know this is a OCP + Extensions specific question, and I know this is FCOS channel, sorry about that. Still, I'll give it a try and ask here in case someone is around and willing to chat (if not, please, don't kick me out and I'll just open an issue for OCP)
20:00 < fidencio> seems that enabling an extension will lead RHCOS inside the node to be in a state where it can't install any other packages via `rpm-ostree install`, as the repo from where the extension package was installed gets removed after the installation happens
20:00 < fidencio> I know, one should *not* be calling `rpm-ostree install` in a RHCOS node, I really know
20:01 < fidencio> but breaking `rpm-ostree install`ability due to extensions enablement also seems .... unexpected.
20:02 < fidencio> as walters is everywhere possible, I found https://github.com/coreos/rpm-ostree/issues/1602 which is slightly related, and his explanation makes sense in that context
20:02 <+ walters> I think you're hitting
https://github.com/openshift/machine-config-operator/blob/c415ce6aed25604bc1d2478951db16759dac31f6/templates/common/_base/units/machine-config-daemon-firstboot.service.yaml#L17 ?
20:03 < fidencio> walters: I wish, with that i'd at least have a workaround, but there's no repo under /etc/yum.repos.d
20:03 <+ walters> fidencio: this is only tangentially related but as of lately we've been working on https://github.com/coreos/enhancements/blob/main/os/coreos-layering.md and https://fedoraproject.org/wiki/Changes/OstreeNativeContainer FYI
20:03 < fidencio> walters: it seems that whatever repo gets enabled as extension, gets also removed after the reboot
20:04 <+ walters> oh right sorry yes...I understand, that is indeed right, the repo is removed
20:04 < fidencio> walters: so, this is ... expected?
20:05 <+ walters> it is the status quo yeah, a bit messy to fix but possible
20:06 < fidencio> walters: hmmm. I'm not sure if I am that comfortable with that status quo
20:07 < fidencio> walters: maybe my use case is wrong from the beginning, to be fair, but I'm actually testing a lightweight VMM (cloud-hypervisor) with the sandboxed-containers stuff, which installs kata-containers on the worker nodes
20:10 < fidencio> walters: do you think it deserves at least an issue open?
20:10 <+ walters> definitely
20:11 < fidencio> walters: okay, i will open the issue against the MCO then, and CC you there
20:11 < fidencio> walters: thanks a lot for the help, and for the pointers
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen
.
Mark the issue as fresh by commenting/remove-lifecycle rotten
.
Exclude this issue from closing again by commenting/lifecycle frozen
./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.