GitHub action to rerun a step upon failure with specific exit-codes is doing nothing

Question

GitHub action to rerun a step upon failure with specific exit-codes is doing nothing

mscheltienne opened this issue 5 months ago · 2 comments

To try to mitigate the intermittent CI failures, I wrote this small GitHub composite action which should retry a step when one of the retry_error_codes is hit. In the case of pytest runs, the action is used here:

      - name: Run pytest
        uses: ./.github/actions/retry-step
        with:
          command: pytest mne_lsl --cov=mne_lsl --cov-report=xml --cov-config=pyproject.toml -s
        env:
          MNE_LSL_LOG_LEVEL: DEBUG

Thus, the exit codes on which a retry should be triggered are the default ones, 134 and 139 (Python Fatal Error, segmentation fault).
Yet, it's clearly not working. c.f. this CI run. It's actually doing nothing, with the runner being terminated prematurely.

Answer 1 · 2024-08-05T16:06:02.000Z

@larsoner When able, could you have a look at the composite action? Maybe you spot why this short bash script is not rerunning the command 😭

Answer 2 · 2024-08-05T17:33:42.000Z

You could for example look at

$ set -o | grep errexit
errexit        	off

If it's on for you then the eval commands will exit as soon as any command hits an error (like set -e has been). So it's possible you need to do something like:

          set +e
          (eval $command)
          set -e

To be explicit, either way at the top of your command I would do something like set -eo pipefail so that if any other commands (other than the one you want to retry) in your script fail it does exit.