Greenboot should notify users of abrupt power failure
say-paul opened this issue · 11 comments
Greenboot should notify user(MOTD) that there is an abrupt boot cycle detected in-case of power loss in device or force killing of a VM during the next reboot.
As there can be things/service that dependent on shutdown targets didn't get executed correctly which may cause issue in the next boot.
example: rpm-ostree update
the staged update gets lost once there is sudden power failure.
How would you suggest it detects this?
I would think of parsing though journald and check for shutdown.target/reboot.target or some related target to validate if a system has been shutdown gracefully.
We've had issues with trying to use the journal in our "control loops"; a very pertinent one related to this is that rhel8's journalctl
will fail to parse journals generated from rhel9 (and historically the MCO did exactly this with journalctl).
See e.g. openshift/os#1271
Today rpm-ostree actually also does exactly this to detect if ostree-finalize-staged failed and there's a whole rpm-ostree history
stuff...see https://github.com/coreos/rpm-ostree/blob/main/rust/src/journal.rs
That said recently I did ostreedev/ostree#2589 which is a bit related here. Arguably indeed we could extend things with a similar model where we persist "attempt to reboot with pending changes to apply" in a persistent non-journal place.
We also need to support systems that don't use a persistent journal. So in general if it's critical, then it can't be in the journal and needs to be external to it. You're arguing for something informative which could be in the journal, but it still gets tricky for the above reasons.
Here's a strawman proposal; what if we just merged the greenboot code as is into github.com/coreos/rpm-ostree ?
We'd make it a new subpackage; the RPM-level transition could either be that we start generating subpackages literally named the same things (possible AFAIK) or we make a new rpm-ostree-greenboot
that Obsoletes: greenboot
. But all the binaries, config files and services would remain the same.
For
We also need to support systems that don't use a persistent journal
I know we did something for rhel9/8 to enable this: osbuild/osbuild-composer#3118, I guess we can do that for fedora too.
very pertinent one related to this is that rhel8's journalctl will fail to parse journals generated from rhel9
I want to understand how integrating greenboot in rpm-ostree will solve the above problem.
I know we did something for rhel9/8 to enable this: osbuild/osbuild-composer#3118, I guess we can do that for fedora too.
@say-paul I think Colin refers to operating system w/o journal altogether, not enabling persistency there..
Here's a strawman proposal; what if we just merged the greenboot code as is into github.com/coreos/rpm-ostree ?
having worked on MCO and the journald thing, I agree it's not ideal and we can't use it, we'd definitely need something more robust... @cgwalters not sure maybe I've missed it, after merging it in rpm-ostree, would the plan be to better integrate it with rpm-ostree?
@cgwalters not sure maybe I've missed it, after merging it in rpm-ostree, would the plan be to better integrate it with rpm-ostree?
My thoughts on this primarily to start are at the very practical level:
- Do either of you watch activity (PRs and issues) for rpm-ostree? I suspect not. I haven't been watching greenboot activity (but I am now); by literally having them in the same codebase, that happens automatically
- CI: It looks to me like there is no CI on this repository that does integration testing; that's something we've invested heavily in in the coreos/ org
- Rust infrastructure too: As https://github.com/fedora-iot/greenboot/commits/greenboot-rs progresses, there's a ton of "overhead and maintenance" stuff for Rust projects that we've invested in (to start, things like dependabot handling, consistent CI checks and MSRV handling, integration with https://github.com/coreos/cargo-vendor-filterer )
- Aligning releases
Beyond the "infrastructure" level, I really want to integrate greenboot state into rpm-ostree status
. I think I mentioned this elsewhere but for https://github.com/coreos/zincati/ we did a ton of work to add this "driver registration" interface basically just so that when you type rpm-ostree upgrade
it tells you "no, updates are driven by zincati".
For greenboot, I think the basic integration here would be showing when the current boot was the target of an automated rollback - and surfacing that in a consistent way via the same rpm-ostree status --json
and/or DBus API.
Beyond the "infrastructure" level, I really want to integrate greenboot state into
rpm-ostree status
that would indeed be ideal, and I think we had this discussion elsewhere too, maybe in the future we could integrate other greenboot's functionality into rpm-ostree too (boot stuff mainly I think to remember).
Infrastructure wise, yeah, our integration tests aren't wired here, osbuild-composer "drives" them and our QE team too, that's not ideal... We do not indeed watch rpm-ostree closely. I guess, I'm not against this, at all, I think it would be beneficial to keep advancing greenboot. There are still things to do correctly (somebody changes something in /etc breaking a greenboot check and no reboot happens, then a working upgrade come but the greenboot check fails because of an unrelated-to-the-upgrade issue). Let's see what others think too.
Beyond the "infrastructure" level, I really want to integrate greenboot state into
rpm-ostree status
that would indeed be ideal, and I think we had this discussion elsewhere too, maybe in the future we could integrate other greenboot's functionality into rpm-ostree too (boot stuff mainly I think to remember).
We started the integration conversation in the context of ostree - ostreedev/ostree#2725
We started the integration conversation in the context of ostree - ostreedev/ostree#2725
Ultimately having this in ostree does I think make the most sense, but at a practical level today the code is invoking rpm-ostree, and I was thinking of this as the "no code changes" move. We can still lower into ostree later.