coreos/zincati

Use new --skip-branch-check and verify fedora-coreos.stream commit metadata before unlocking

jlebon opened this issue · 4 comments

This is the Zincati part of coreos/fedora-coreos-tracker#749. We need to have it start using the new --skip-branch-check when calling rpm-ostree deploy and also check in the commit metadata that the stream matches the expected output.

See: coreos/fedora-coreos-tracker#749 (comment)
See: coreos/fedora-coreos-tracker#749 (comment)

Re: coreos/fedora-coreos-tracker#749 (comment), wouldn't it make more sense to check the commit metadata before trying to stage it (i.e. check it at the check updates stage, that way we avoid fully deploying an invalid release)?
I think checking metadata makes more sense at the check updates stage, since a release with bad metadata isn't a transient error, i.e. an update with bad metadata won't fix itself over time; there's no point retrying again later, or ever.
So I think coreos/fedora-coreos-tracker#749 (comment) makes more sense to me. Am I overlooking anything here?
/cc @lucab

I think the rationale there was that the likelihood this happens was so low that it wasn't worth the added complexity of doing a separate pull --commit-metadata-only to check the stream. It's more like a final sanity-check before lifting the lock using information we already have on-hand. Though definitely cool with me if we go the pull --commit-metadata-only path too!

Ah okay, that also makes sense. I guess I don't fully know how dangerous it is to deploy a commit potentially from some unknown stream, so it seemed a bit weird to me to check after already creating a deployment and writing everything on disk.

Got it, the sanity check is not for any security purposes (since the the super old node should still have the right keys to verify the nearest barrier release that it's trying to update to, and so we can rely on rpm-ostree to make sure we only deploy if the update is trusted). The sanity check is in place to address

17:01:14 <jlebon> imagine if we somehow screw up the graph, and the node goes from testing to stable. now, from that point on it's extremely difficult to rectify that situation because the node will now be checking the wrong update graph
17:01:29 <jlebon> it's super unlikely, but super bad

which is extremely unlikely. Essentially it's there to detect errors in the update graph provided by Cincinnati.