RFE: Allow manual update checks and reboots
kelvinfan001 opened this issue · 22 comments
Feature Request
Desired Feature
Allow for manual, safe, Zincati-driven update checks and reboots.
Currently, in rpm-ostree (v2021.2 and later), if a user tries to rpm-ostree upgrade
when an update driver (e.g. Zincati) "owns" updates on that machine, rpm-ostree will correctly refuse by default and instruct the user to refer to the updates driver's (Zincati's) documentation, implying that the user should perform an upgrade via the updates driver, instead. However, there is actually no convenient/user-friendly way to perform an upgrade immediately through Zincati, either.
Stemming from the conversation in coreos/rpm-ostree#2566 (comment), it would be nice if an admin could manually get Zincati to immediately check for an update and possibly reboot into it.
A possible use case for this feature would be when admins know that there is a new update available that contains a bug fix/feature and they want it immediately, but Zincati has not automatically updated into that release due to not checking for updates frequently enough, reboot strategy restraints, or phased rollouts (and wariness). Current ways admins could get around this:
- Reconfigure Zincati to e.g. have lower wariness, use
immediate
reboot strategy. Then restart Zincati. - Use the more direct
rpm-ostree upgrade --bypass-driver
.
1.
does not seem very user-friendly, and 2.
is potentially unsafe as rpm-ostree has no knowledge of update graphs, barrier releases, reboot scheduling windows, etc.
Example Usage
check-update
Include a command for telling Zincati to check for updates immediately. This should probably temporarily set Zincati's rollout wariness to 0.0
in order to hint Cincinnati to respond with the latest possible release.
$ zincatictl check-update
No new updates.
$ zincatictl check-update
Release ... found and deployed. Use `zincatictl finalize` to unlock staged deployment and reboot into it.
finalize-update
Also include a finalize-update
command to override the reboot strategy, unlock the staged deployment, and reboot immediately.
If strategy allows reboot now, machine will reboot:
$ zincatictl finalize-update
If strategy does not allow reboot:
$ zincatictl finalize-update
Update strategy does not allow for reboot. Use `--force` to force an update finalization.
Force a reboot, overriding the reboot strategy:
$ zincatictl finalize-update --force
Note: the --force
flag (as opposed to force by default) is useful because Zincati has a DEFAULT_REFRESH_PERIOD_SECS
that periodically checks for permission to reboot after an update is staged. finalize-update
should get Zincati to check for permission immediately.
Other Information
Relevant rpm-ostree PRs and issues:
/cc @jlebon @cgwalters to check if this makes sense and is compatible with rpm-ostree's proper usage (not sure if this functionality should be exposed to the user through rpm-ostree instead).
The RFE itself makes a lot of sense to me! Before adding a CLI, I'd definitely lean towards keeping the focus on integration with rpm-ostree instead to keep the UX simple. E.g. rpm-ostree already knows how to present available updates, diffs, etc...
So implementation-wise, we should probably brainstorm on what the API between rpm-ostree and update drivers should be. Maybe it's UNIX sockets, or D-Bus, etc... E.g. for D-Bus, we could standardize on a well-known bus name that we expect update drivers to acquire and then you'd have a GetUpdate
method which rpm-ostree upgrade
would call out to to check for updates and get the OSTree commit to upgrade to if so.
I agree that it's more user-friendly to just have rpm-ostree communicate with the update driver, instead of referring the user to the update driver and learn another CLI.
for D-Bus, we could standardize on a well-known bus name that we expect update drivers to acquire and then you'd have a GetUpdate method which rpm-ostree upgrade would call out to to check for updates and get the OSTree commit to upgrade to if so.
#514 added a POC D-Bus interface to Zincati (currently the bus name for now is org.coreos.zincati
). There's also a WIP PR here that has two methods, CheckUpdate
and FinalizeUpdate
, but I think first we should come up with a set of APIs that update drivers for rpm-ostree should have.
Like @jlebon mentioned, we should at least have a GetUpdate
method, where the update driver returns whether there is a possible version to "legally" update to.
One thing to consider about this option is, for drivers like Zincati that may have update "strategies", can we safely assume that a user who calls rpm-ostree upgrade
always wishes to ignore any restrictions that the update driver's update strategy imposes? If not, then perhaps we should have a method that tells the update driver to "check update and try to reboot into it" in a single method call, so then if the update driver has a strategy that disallows an immediate reboot, it would just fail. Or maybe have another method along the lines of "check if reboot is allowed by updates driver", but that would mean all rpm-ostree update drivers have a update strategy or something else that prevents spontaneous reboots; I'm not sure if this is always the case.
Random comment on this, I think a common use case for the "manual update" path will be to have Zincati actually be "almost disabled" (systemd unit enabled, but config files neutering automatic updates) so that all updates are done manually by a sysadmin. (See discussions in https://discussion.fedoraproject.org/t/28946.)
@jlebon thanks for mentioning this. We have a couple of FCOS instances but not enough to warrant a zincati infrastructure. On the other hand, we must control the reboot windows and would rather use a generic automation tool via ssh (e.g. Ansible) for upgrading the instances. For that, such a CLI is needed.
@kai-uwe-rommel correct me if I misread your usecase, but I think you'd be properly served by doing #245 and letting Ansible own such file.
@lucab yes, may be. But that feature does not yet exist, right?
At the moment I'm looking for a solution to upgrade the outdated instances through the barrier release now.
But of course, a long term solution for further updates is something needed as well.
@kai-uwe-rommel correct. The specific context was, going forward, how to better serve usecases/flows like your which are really a mix of automation plus human-control.
Another use case for this feature could be for situations such as #554 (comment) where a user wants an immediate reboot into a staged update.
Some concerns raised regarding this feature: #554 (comment)
I like the idea that Zincati offer some level of CLI interaction / introspection.
Initiating actions like check-update
to check for an update now (was an update found? is it already being downloaded? progress?) would be useful. finalize
to reboot now would be useful when testing the fleetlock protocol implementations (maybe --force
overrides strategy as the OP suggests). And status commands to show where in the finite state machine Zincati thinks it is. Zero touch is somewhat opaque at the moment.
status commands to show where in the finite state machine Zincati thinks it is
This one specifically is now exposed to systemd:
$ systemctl show -p StatusText zincati.service
StatusText=periodically polling for updates (last checked Thu 2021-08-05 10:48:44 UTC)
Although it is meant as a human/debugging helper, not as a machine API.
I think we bumped up into this overall topic again in https://discussion.fedoraproject.org/t/unable-to-upgrade-to-35-20211029-3-0/34925/4?u=dustymabe.
Basically if we're going to break the user expectation that reboot will land them in the new deployment we need to feed them something else. "Don't use $that, use $this", works better than "Don't use $that, just wait".
Uhm, for that specific post the user did configure a periodic
window and in same spot is expecting an update/reboot to happen (immediately) outside of that timeframe. And then getting a reboot pending due to update strategy
and wondering why it is still kept staging. And then rebooting manually.
So yes a finalize-update --force
could have somehow worked here, but it's also an extreme point of conflicting/confused expectations from the user (which can't really be fixed by code).
Is there any update on this to allow bypassing the set update window for the staged upgrade?
@mitchellmaler no, otherwise it would have been noted here or in an associated PR.
I agree we need more introspection but I also find it hard to understand my status as a client of an upstream Cincinnati server. I just started using Zincati around 6 days ago and I haven't seen a successful update. I haven't seen any errors yet either and from the journal I'm told that Zincati is polling. It would be awesome to get a little more detail about the Cincinnati server I'm connected to. Are there new updates for me? Are my nodes just waiting in line? Whats the ETA for my node to get an update? The first two are especially important when your just starting out and aren't sure how zincati should log these various states. I may be misunderstanding but as a new user I have no Idea if my config is write or wrong.
@dwarf-king-hreidmar please don't hijack existing tickets for different topics.
Yes, there have been no new FCOS releases in the last 6 days. Maybe we can slightly reword the status message, but here periodically polling for updates
means that the agent is correctly performing update checks (timestamp of the last one is noted) and there is nothing new for the moment.
If you are unsure about something, there are both realtime metrics or you can (temporarily) crank up logging verbosity and peek into the internal flow of the agent.
Whoops, maybe I pinged the wrong person: @kelvinfan001 @jlebon @dustymabe @dghubble is there anyone working on this?
Not currently, but I'm hoping that we'll be able to address this UX gap in a larger rework of Zincati.
Related: containers/bootc#337 (comment)