theupdateframework/tuf-on-ci

Handle signing-event failures more gracefully

Opened this issue · 5 comments

jku commented

EDIT: This issue was originally filed for all tuf-on-ci repository tools but it really only applies to signing event right now: see comments.

I'm not sure what the correct way to implement this is but: failures in automation should be really visible. We can't allow e.g. signers not being notified of expiry because of a bug -- all workflow failures should lead to an issue being filed or something like it

one example of this is when calls like this are made

if tuf-on-ci-status;  then
  ...
else
  ..
fi

if playground-snapshot fails here with a python stack trace, it just leads to the else-branch running: we can't use tool return values like this

  • at the very least we need to check for specific return values and make sure our "successful" return values are clearly special (values 3-125 should be fair game for app specific ones)
  • alternatively we could stop using return values to pass information: it is a bit ugly
jku commented

Assuming we keep using exit codes to communicate the "result":

The more professional way to fix this might be to change the printed output to something structured: Instead of e.g. tuf-on-ci-status printing the raw status message, we'd print json with fields "success" and "status message". This way the exit value could always be 0. I don't think I will go that far yet though

Using script specific exit codes is a good way to report error. If we need more details, we can always print to stderr and have the parent process capturing that.

jku commented

I'll have a go at this

jku commented

It turns out that this is not nearly as critical as I thought:

  • online-sign does not use the pattern in the original comment anymore
  • the only action that does care about exit values is signing-event: failures in signing event are unfortunate and should lead to a comment saying so... but they won't go unnoticed like the same pattern in online-sign could
jku commented

I have a branch https://github.com/jku/tuf-on-ci/tree/exit-value-refactor but as mentioned

  • this only modifies the signing event, where the only advantage is that there is a comment saying "oops something failed" instead of just a failing check
  • As long as we're going to do #78 in near future, I think that is not very valuable (because in PRs failing checks are visible on their own)

So I think I'm going to drop this for now.