File issues when workflows/actions fail
Closed this issue · 2 comments
jku commented
Summary
If any of
- online-sign
- publish
- create-signing-events
somehow fail, we should file an issue.
some details:
- There should only ever exist one open issue (per workflow)
- Likely subsequent failures should add a comment to an existing issue
- If the issue resolves itself (e.g. online-sign succeeds): the issue should be closed with a comment
- The issue comment should explain
- what failed
- what will be done automatically to resolve
- what can be done by who to manually resolve
- what is the timeline to repository failure (like "Unless the online-signing issue is resolved, breakage will be visible to clients in 7 days (Sun 28 Jan 2024 16:22 UTC)")
I have just recently written most of the above functionality for sigstore/sigstore-probers :see open-workflow-issue (and close-workflow-issue): I think we can mostly copy that.
Open question
It would be quite neat if we can hide this all in the actions: that way workflows would not become more complicated.
- I believe this is possible with the composite actions we use but I'm not 100% sure:
- add open-workflow-issue as an internal action in tuf-on-ci
- Call that action in the relevant actions listed in the beginning with some combination of
if: always() && failure()
(GitHub conditions are a mystery so I'm not 100% what actually works)
- This does make the actions even more magical than they are: I'm not sure if everyone is happy about that... but I think I'd rather try embedding the issue filing in the actions
jku commented
From @kommendorkapten:
There should be a way to customize the issue comment:
- we could have a well known file in the repository (
.tuf-on-ci/failure-message.md
) or a repository variable that gets inserted into every issue comment - This enables mentioning maintainers or teams in the issue which then enables any alert mechanisms that integrate with GitHub Issues (slack etc)
jku commented
Taking this, but it might take a few days:
- Test composite actions and
if: failure()
:
☑️ seems to work just fine: if any step in the composite action fails, the step withif: failure()
(orif: always()
) will be executed. - modify the sigstore-probers implementation for our case (
actions/internal/update-issue/action.yml
or something) - integrate into the existing actions (create-signing-events, online-sign, upload-repository -- but maybe not signing-event?)
- I think the best option might be to have a
if: always()
step that then givesgithub.action_status
as argument to the internalupdate-issue
action
- I think the best option might be to have a