Automatically rerun failed github workflows
This is an example repo with a demonstration workflow to show how we can use just gh cli to trigger a rerun of failed jobs in a workflow using a drop in solution that only requires one customization.
This example repo has a matrix job with two parts.
strategy:
fail-fast: false
matrix:
name: [workflow-example]
conclusion: ["job-success", "job-failure"]This matrix fails on the second matrix in the first job, job-failure, until it has been rerun twice, based on the github.run_attempt context
if: matrix.conclusion == 'job-failure' && github.run_attempt < 3
run: exit 1Otherwise the ci-auto-rerun-failed-jobs.yml would attempt to rerun up to 5 (default) times before it stops. This setting can be configured.
if: inputs.attempts <= inputs.retriesIt does this using this an end of workflow job and triggering the workflow file ci-auto-rerun-failed-jobs.yml if there were any failures if: failure()
There are 3 main components of this to apply. They will be detailed and explained here.
This goes at the start. It provides the core requirements to use the ci-auto-rerun-failed-jobs.yml workflow such as workflow_dispatch and inputs
on:
workflow_dispatch:
inputs:
skip_rerun:
description: "Skip rerun?"
required: true
default: false
type: boolean
retries:
description: "Number of rerun retries"
required: true
default: "5"
type: choice
options: ["1", "2", "3", "4", "5", "6", "7", "8", "9"]Note
When manually running the job via workflow_dispatch you can set two options.
skip_rerun- a true or false (default false) options to bypass the rerun.retries- the number of retry attempts as a list of options, 1 to 9 times.
This would go at the end of you workflow, as a separate job, that you want to make sure completes. It will call the ci-auto-rerun-failed-jobs.yml and pass some critical inputs.
Warning
The one thing you need to make sure is customized is the needs: [build, release] to match the job name it is tracking. No other customizations are required.
This job will work for workflow_dispatch and schedule jobs.
ci-auto-rerun-failed-jobs:
if: failure() && (github.event.inputs.skip_rerun || 'false') == 'false'
needs: [build, release]
concurrency:
group: ci-auto-rerun-failed-jobs
cancel-in-progress: true
permissions:
actions: write
runs-on: ubuntu-24.04-arm
env:
GH_TOKEN: "${{ secrets.AUTO_RERUN || github.token }}"
github_repo: "" # To use ci-auto-rerun-failed-jobs.yml hosted in a remote repository else default to the current repository. Requires PAT token AUTO_RERUN
retries: ${{ needs.scheduled_defaults.outputs.retries || '3' }}
distinct_id: ${{ github.event.inputs.distinct_id }}
steps:
- uses: actions/checkout@v4
with:
persist-credentials: false
- name: ci-auto-rerun-failed-jobs via ${{ env.github_repo || github.repository }}
run: >
gh workflow run ci-auto-rerun-failed-jobs-action.yml
--repo "${github_repo:-$GITHUB_REPOSITORY}"
-f github_repo=${GITHUB_REPOSITORY}
-f run_id=${GITHUB_RUN_ID}
-f attempts=${GITHUB_RUN_ATTEMPT}
-f retries=${retries}
-f distinct_id=${distinct_id}Tip
Since we are using gh cli you can define inputs to be passed you can add them like this
-f name=valueThe just add it as an input in the ci-auto-rerun-failed-jobs.yml and process it there as ${{ inputs.name }}
This unique workflow will then take the defined inputs and rerun the job, watch it to conclusion and provide a small job summary
Note
It works best locally but is setup to also be triggered from a remote repo
You will need a PAT token configured as secrets.AUTO_RERUN with actions/content/workflow perms in the local and remote repo.
name: ci auto rerun failed jobs
on:
workflow_dispatch:
inputs:
run_id:
description: "The run id of the workflow to rerun"
required: true
attempts:
description: "The number of attempts to rerun the workflow"
required: true
retries:
description: "The number of retries to rerun the workflow"
required: true
github_repo:
description: "The repository to rerun the workflow"
required: false
distinct_id:
description: "The distinct id of the workflow to rerun"
required: false
run-name: ci auto rerun failed jobs - attempt ${{ inputs.attempts }}
jobs:
gh-cli-rerun:
name: rerun - attempt ${{ inputs.attempts }}
permissions:
actions: write
runs-on: ubuntu-latest
env:
GH_TOKEN: "${{ secrets.AUTO_RERUN || github.token }}"
steps:
- name: Host - Checkout action ${{ inputs.distinct_id }}
uses: actions/checkout@v4
- name: gh cli rerun and summaries ${{ inputs.distinct_id }}
if: inputs.attempts <= inputs.retries
run: |
github_repo="${{ inputs.github_repo || github.repository }}"
failures="$(gh run view ${{ inputs.run_id }} --log-failed --repo "${github_repo}" | sed "s,\x1B\[[0-9;]*[a-zA-Z],,g")"
if [[ -z "${failures}" ]]; then
failures="$(gh run view ${{ inputs.run_id }} --repo "${github_repo}" | sed "s,\x1B\[[0-9;]*[a-zA-Z],,g")"
fi
if [[ "${{ inputs.retries }}" -ge "2" ]]; then
gh run rerun "${{ inputs.run_id }}" --failed --debug --repo "${github_repo}"
else
gh run rerun "${{ inputs.run_id }}" --failed --repo "${github_repo}"
fi
printf '%b\n' "# gh cli workflow reruns" >> $GITHUB_STEP_SUMMARY
printf '\n%b\n' ":octocat: Here is a summary of inputs from the failed workflow" >> $GITHUB_STEP_SUMMARY
printf '\n%b\n' "🟥 Failures at:\n\n\`\`\`log\n${failures}\n\`\`\`" >> $GITHUB_STEP_SUMMARY
printf '\n%b\n' "🟦 Attempt: ${{ inputs.attempts }} - Rerun failed jobs in ${{ inputs.run_id }} :hammer:" >> $GITHUB_STEP_SUMMARY
if gh run watch ${{ inputs.run_id }} --exit-status --repo "${github_repo}"; then
printf '\n%b\n' "✅ Attempt: ${{ inputs.attempts }} succeeded 😺" >> $GITHUB_STEP_SUMMARY
else
printf '\n%b\n' "❌ Attempt: ${{ inputs.attempts }} failed 😾" >> $GITHUB_STEP_SUMMARY
fiSome images visually demonstrating the progress of the example.
There are a few things you need to understand about how this works and what you can and cannot do with it.
-
workflow_call- This is used when calling the workflow from a reusable workflow. The problem with this is that the workflow is run as a child of the parent.In order to rerun a workflow using
gh cliit must have ended or you will get an error. So as this would be a child process it would prevent the parent from ending and prevent us from being able to rerun it. A child process attempting to restart the parent process. It won't work and you will get gh cli errors and a failed rerun.parent-worklow (ID) ├─ child-failed-job └─ child-rerun-job # a child trying to restart parent ID whilst running will result in an error
-
schedule- currently theschedulekey cannot take inputs. So if your workflow was started via a schedule theworkflow_dispatchinputs are all ignored and null. There are two ways to handle this.We can set defaults like this, where we provide a default values if the inputs are null. These are what we call the schedule default values.
if: failure() && (github.event.inputs.skip_rerun || 'true') == 'false' run: gh workflow run ci-auto-rerun-failed-jobs.yml -f run_id=${{ github.run_id }} -f attempts=${{ github.run_attempt }} -f retries=${{ github.event.inputs.retries || '5' }}
Another option is to use a small job to define the default values as outputs that can be used throughout the workflow:
scheduled_defaults: runs-on: ubuntu-latest outputs: skip_rerun: ${{ github.event.inputs.skip_rerun || 'false' }} retries: ${{ github.event.inputs.retries || '5' }} steps: - name: Setting Outputs from inputs run: printf '%b\n\n' "Setting Outputs from Inputs"
So the checks would become like this instead
if: failure() && needs.scheduled_defaults.outputs.skip_rerun == 'false' run: gh workflow run ci-auto-rerun-failed-jobs.yml -f run_id=${{ github.run_id }} -f attempts=${{ github.run_attempt }} -f retries=${{ needs.scheduled_defaults.outputs.retries }}
You can use this via a composite action hosted in this repo using this example:





