cli/cli

gh run rerun --job is not working as documented

sochotnicky opened this issue · 6 comments

Describe the bug

This is what gh run rerun -h help looks like in the most recent version:

$ gh run rerun  -h
Rerun an entire run, only failed jobs, or a specific job from a run.

Note that due to historical reasons, the `--job` flag may not take what you expect.
Specifically, when navigating to a job in the browser, the URL looks like this:
`https://github.com/<owner>/<repo>/actions/runs/<run-id>/jobs/<number>`.

However, this `<number>` should not be used with the `--job` flag and will result in the
API returning `404 NOT FOUND`. Instead, you can get the correct job IDs using the following command:

	gh run view <run-id> --json jobs --jq '.jobs[] | {name, databaseId}'


USAGE
  gh run rerun [<run-id>] [flags]

FLAGS
  -d, --debug        Rerun with debug logging
      --failed       Rerun only failed jobs, including dependencies
  -j, --job string   Rerun a specific job from a run, including dependencies

INHERITED FLAGS
      --help                     Show help for command
  -R, --repo [HOST/]OWNER/REPO   Select another repository using the [HOST/]OWNER/REPO format

LEARN MORE
  Use `gh <command> <subcommand> --help` for more information about a command.
  Read the manual at https://cli.github.com/manual

Specifically, one of the flags is -j (or --job) to rerun a job with specific name. However:

$ gh run rerun 1234 --job anyname
specify only one of `<run-id>` or `--job`

Usage:  gh run rerun [<run-id>] [flags]

Flags:
  -d, --debug        Rerun with debug logging
      --failed       Rerun only failed jobs, including dependencies
  -j, --job string   Rerun a specific job from a run, including dependencies

I haven't yet looked at the code but seems to me that arg parsing is wrong? Rerunning a job name without providing a run number does not make sense I believe.

Steps to reproduce the behavior

  1. Run 'gh run rerun 1234 --job anyname'
  2. See error during parsing of arguments

Expected vs actual behavior

I'd expect to be able to re-run a job with specific from a given run with --job switch

Logs

Paste the activity from your command line. Redact if needed.

Reading the code for the rerun & the beginning of the doc string, I think the intention really is that you either provide a job ID or the run ID. Maybe the issue here is more about the documentation for that subcommand which:

  • makes it seem like you can provide --job and runid at the same time
  • parameter is string type, not a number so that makes the confusion worse

Yes I can see why this would be confusing. I agree with what you said about the docs, and I think there's a couple of other things that could be improved. In summary:

  1. Because the command was originally created with a positional arg for the run ID it seems like it can be provided with the job ID to narrow it down.
  2. Because the field is --job and not --job-id it looks like you might be expected to provide the name
  3. Because the field is a string type and not a number, it looks like you might be expected to provide the name
  4. Because the docs list the { name, databaseId } it looks like you might be expected to provide the name

So what can we do?

1. Positional vs Flag Args

In terms of the command, we can't really change much about this for backwards compatibility reasons. We need to continue supporting both the positional arg and the flag. The endpoints that are hit in providing these IDs are completely different:

path := fmt.Sprintf("repos/%s/actions/runs/%d/%s", ghrepo.FullName(repo), run.ID, runVerb)

and

path := fmt.Sprintf("repos/%s/actions/jobs/%d/rerun", ghrepo.FullName(repo), job.ID)

We could allow the runID to be provided with the jobID and then at this point

if jobID != "" {
opts.IO.StartProgressIndicator()
selectedJob, err = shared.GetJob(client, repo, jobID)
opts.IO.StopProgressIndicator()
if err != nil {
return fmt.Errorf("failed to get job: %w", err)
}
runID = fmt.Sprintf("%d", selectedJob.RunID)
check that the provided runID matches the one in the --job flag.' This error would say something like:

The job with ID <JOB_ID> does not belong to a run with ID <RUN_ID> but <ACTUAL_RUN_ID>. Note that if the `--job` flag is provided, it is not necessary to provide a run ID.

Alternatively, it looks like gh run view prints a warning that it is ignoring the run ID

if opts.RunID != "" && opts.JobID != "" {
opts.RunID = ""
if opts.IO.CanPrompt() {
cs := opts.IO.ColorScheme()
fmt.Fprintf(opts.IO.ErrOut, "%s both run and job IDs specified; ignoring run ID\n", cs.WarningIcon())
}
}

2. Flag naming

Again due to backwards compatibility we can't just change the name of this flag to --job-id. We could deprecate it, but I think perhaps just being clearer in the job description that this is expected to be an ID would be enough.

3. String type

I think we could change this to an int64. The databaseID that is used should always parse into an int64 and I don't believe it should ever be a string.

4. { name, databaseId }

I wrote this doc comment after using the wrong ID so many times. I included the name so that people running the command would be able to match the ID to the correct job. I think the comment is good, we probably just need to call out that it's the databaseID that should be selected.


What do you think about these? Any further thoughts?

@williammartin Generally agreed with your points. More specifically

  1. I value consistency so doing what gh run view does (print a warning) seems like a good option
  2. I agree if the flag description is updated to include "ID" in some form it will be clear enough
  3. That will help strengthen point 2.
  4. Yup, making it clearer about databaseID might help too

When combined the proposed changes sound good enough. Certainly no need to deprecate stuff or break compat.

Cool cool, are you interested in opening a PR to get these changes made?

Cool cool, are you interested in opening a PR to get these changes made?

Sure, the changes seem small enough. I'll have a look in the next few days