palantir/policy-bot

Improve error behavior when loading policies

bluekeyes opened this issue · 1 comments

As reported in #405, Policy Bot will sometimes fail to discover a policy file due to a GitHub timeout (or other network error.) When this happens, we post a failed status check on the PR. This is undesirable for repositories that otherwise don't use Policy Bot, especially if the timeout or error happened while checking if there was an organization-level policy.

To make the issue worse, the failed status is never removed because when everything works correctly, the repository is ignored.

I think there are a few possible improvements:

  1. Detect and retry timeout errors while loading policies, if we do not already
  2. Make the timeout or retries configurable, if they are not already
  3. Log errors from loading policies and only post failed status checks for errors that happens after we've read a policy file from GitHub. This requires that repositories expecting a Policy Bot status mark it as required. This could also create an opposite problem, where the Policy Bot status is missing from a PR where it is required and it is hard to find out why.
NargiT commented

Retrying could fix the problem by either removing the status check or updating it. And we could trigger it by simple writting /retry in the comment.