theupdateframework/tuf-on-ci

File issues when workflows/actions fail

Closed this issue · 2 comments

jku commented

Summary

If any of

  • online-sign
  • publish
  • create-signing-events

somehow fail, we should file an issue.

some details:

  • There should only ever exist one open issue (per workflow)
  • Likely subsequent failures should add a comment to an existing issue
  • If the issue resolves itself (e.g. online-sign succeeds): the issue should be closed with a comment
  • The issue comment should explain
    • what failed
    • what will be done automatically to resolve
    • what can be done by who to manually resolve
    • what is the timeline to repository failure (like "Unless the online-signing issue is resolved, breakage will be visible to clients in 7 days (Sun 28 Jan 2024 16:22 UTC)")

I have just recently written most of the above functionality for sigstore/sigstore-probers :see open-workflow-issue (and close-workflow-issue): I think we can mostly copy that.

Open question

It would be quite neat if we can hide this all in the actions: that way workflows would not become more complicated.

  • I believe this is possible with the composite actions we use but I'm not 100% sure:
    • add open-workflow-issue as an internal action in tuf-on-ci
    • Call that action in the relevant actions listed in the beginning with some combination of if: always() && failure() (GitHub conditions are a mystery so I'm not 100% what actually works)
  • This does make the actions even more magical than they are: I'm not sure if everyone is happy about that... but I think I'd rather try embedding the issue filing in the actions
jku commented

From @kommendorkapten:
There should be a way to customize the issue comment:

  • we could have a well known file in the repository (.tuf-on-ci/failure-message.md) or a repository variable that gets inserted into every issue comment
  • This enables mentioning maintainers or teams in the issue which then enables any alert mechanisms that integrate with GitHub Issues (slack etc)
jku commented

Taking this, but it might take a few days:

  • Test composite actions and if: failure():
    ☑️ seems to work just fine: if any step in the composite action fails, the step with if: failure() (or if: always()) will be executed.
  • modify the sigstore-probers implementation for our case (actions/internal/update-issue/action.yml or something)
  • integrate into the existing actions (create-signing-events, online-sign, upload-repository -- but maybe not signing-event?)
    • I think the best option might be to have a if: always() step that then gives github.action_status as argument to the internal update-issue action