smarkets/marge-bot

Marge fails too hard on network errors

raphael-proust opened this issue · 1 comments

When margebot encounters a network error during the merging process (e.g., timeout when checking the CI status), it fails hard and adds a "I'm broken inside" comment. Because network errors can be transient, margebot should retry failed network requests.

I can try to make a PR for this, but I have the following questions:

  • Do you agree with the diagnostic and with the main proposal?

  • I'm not sure where to make the necessary modifications in the code. Specifically, I'm not sure what granularity to have:

    • At the high-level of single_merge_job's execute: we won't miss any errors (at least not during the merging process)?
    • At a lower level (fetch_approvals and update_merge_request_and_accept or even lower): we have more specific context for what stage failed?
    • At a higher level (outside of single_merge_job)?
    • At a different level altogether such as by patching the self._api object or changing some configuration of the underlying http request library.

Yes, please add this feature! Would save us a lot of work!