[RFE] Rollback to a specific version of the Flatcar environment
Opened this issue · 0 comments
Current situation
At the moment, Flatcar manages update through a Blue/Green approach which is fast and efficient. From what I have read in the documentation (not tested yet, I still use the LTS in 3510 which does not include the modification introduce in 3535), works have been done to manage the user configuration over time. This allows to manage what shall be kept or not over time, and part of a stateful environment.
I have a specific use case and so three main questions.
Let’s consider that I am running Flatcar in version A (on the USR-A partition) since xxx month and I upgrade it to version B:
- Does Flatcar ensure that the mutable partition (ROOT) is not impacted by the prefetch of version B in USR-B ? Can you confirm that no changes are applied to the ROOT partition during the prefetch ? If changes occur, which ones ?
- Does Flatcar allow to recall the ignition file just after a switch from USR-A (version A) to USR-B (version B) by calling the flatcar_reset for example ? Or shall it be done manually?
- After the migration, I may notice critical abnormalities (for whatever reason) and I need to restore as quickly as possible USR-A (version A) + ROOT (with version A configuration and data) in the last state to ensure that my system is still working. The last state is the “snapshot” just before a switch (this snapshot is maybe not the state when I install version A and configure ROOT for the first time because configurations may have evolved from my initial setup).
With the current solution, I may be able to restore Flatcar in version A, maybe reapply the ignition associated to the installation and retrieve partially my state when installing it. However, evolved data cannot be retrieved using the initial ignition. This lead me to a platform in version A that is not the same as the one before the switch. From my understanding, such rollback feature is not managed by the current mechanism.
Is it planned to work on something that may managed this "snapshot" feature?
I am looking for a solution with the current LTS version (3510) that consists to provision several ROOT partition: ROOT-A and ROOT-B and to switch the / partition during the switch. Like the rollback using USR-A/USR-B partition mechanisms, I would like a "snapshot" of the mutable world associated to USR-A when switching to USR-B. Whenever I want, I’m able to restore the couple USR-A/ROOT-A to the state ("snapshot") before the update.
Do you know any limitations in the currents Flatcar versions to setup such mechanism with an ignition file? Can I break Flatcar behavior by switching the mutable partition?
Impact
None
Ideal future situation
Like the rollback using USR-A/USR-B partition mechanisms, i have a "snapshot" of the mutable world associated to USR-A when switching to USR-B. Whenever i want, i'm able to restore USR-A to the state before the update.
Implementation options
- Using the post-hook install, it is possible to trigger a specific operation when switching from partition A to B, but my problem may be detected later and thus it may not be useful. However may be it can be use during the rollback to restore the "snapshot"
- Version 3535.0.0 seems to introduce many options that may help, like flatcar_reset which may help partially (as it rely on an ignition file which may not cover the overall configuration)
- Looking at SystemD-confext give me the feeling that it may help by creating an archive under a version number which may be restore later
- Historically at my job, such mechanism was managed using two partitions where the whole system (and its config, data, etc.) was placed. Changing version was about switching from one grub entry to another one. It may be possible to do something similar by extending the update mechanism and saving a previous state to a dedicated partition that will be passed to the kernel during the reboot. It seems to me a bit overkill, but i need to guarantee that we restore version A at from a given time as fast as possible.
Additional information
If possible, the solution shall have a small footprint and be part of the OS to keep it as simple as possible, like update_engine. What i expect from such feature is something reliable for long live system (running 24h/24, 7d/7).
I deliberately simplify the problem to keep it concise, but it cover overall platform/infrastructure backup and restoration overtime. I'm only focusing on what Flatcar may offer for what it manages.
Despite what i have look on internet on the subject, i may have miss the golden egg. So i will be quite happy if you can share some pointer and i deeply apologize for the request if their is already an obvious answer!