palantir/bulldozer

PRs left open when GitHub is too busy

justinwyer opened this issue · 4 comments

We have been experiencing an intermittent issue where merge requests are rejected by the GitHub API:

{
	"level": "info",
	"rid": "cf3n6dbn37kclfh117a0",
	"github_event_type": "status",
	"github_delivery_id": "3cb85cd2-96e7-11ed-9516-9f680ab3bc27",
	"github_installation_id": 1,
	"github_repository_owner": "acme",
	"github_repository_name": "anvils",
	"github_pr_num": 26725,
	"time": "2023-01-18T04:19:02.713727629Z",
	"message": "Merge rejected due to unsatisfied condition: \"Base branch was modified. Review and try the merge again.\""
}

A small percentage of PRs are left unmerged by bulldozer due to the above issue. At this point there are no more triggering events affecting the PR so it never retries. This seems to happen when we have a lot of PRs getting merged in quick succession and is sporadic in nature.

I am happy to contribute a fix for this, but I am not sure what the best approach might be:

  1. Retry after a small delay? In the event bulldozer is restarted the PR would once again end up left open, but this would be exceedingly rare and so is acceptable.
  2. Periodically scan PRs in a repo? This feels like a can of worms best not opened.

Or if you have any other ideas I would be grateful.

Hey @justinwyer, thanks for submitting this. I think we already try 5 retries when merging, so something else might be going sideways here. How many pull requests are merging when you see this type of issue? We could probably increase the number of retries and backoff.

Thanks @asvoboda it seems like there is no retry in this case as the error returns non as retry-able

case http.StatusMethodNotAllowed:
logger.Info().Msgf("Merge rejected due to unsatisfied condition: %q", gerr.Message)
return false, false
happy to open a PR to retry when the error from the API matches Base branch was modified. Review and try the merge again. or we could more generally make http.StatusMethodNotAllowed retry-able. What would you prefer?

Yep that'll do it.

I think it probably makes sense to check for that specific error. In general the 405 should indicate the request is actually not meant to be retried, but Github internally treats this particular error as retry-able, which is a little bit wonky.

Okay great, will open a PR a bit later.