FR: retries in the agent
dhalperi opened this issue · 3 comments
Many of our builds save an artifacts in step 1 and then download it in steps 2..N. We've seen several flaky builds this week based on failure to download artifacts from step 1 on only one of the later steps. They are all 502 errors.
Copying from an email discussion with support@, would be great if this was retried more proactively (with sensible backoff and limits, of course - don't want to make an outage worse).
It looks like the call on the agent (https://github.com/buildkite/agent/blob/30f8cc6526b88833d0411bcbba1527862c3a5207/agent/artifact_searcher.go#L34) that the plugin is using is not utilising the retry functionality that we already have in the agent so we are going to look at adding that in.
I figured it wouldn't hurt to record in an issue :).
Hey @dhalperi! sorry for the long delay!
Reading through the agent's code, it would seem that artifact uploading does have retries. What's more there is a reported issue in the agent to implement it but reading through the whole thing I found that it is already implemented as well.
That said, the error in the pipeline is not with the downloading itself, but with an intermediate step in which the agent searches for the artifact's URL and that request failing. That is an actual agent issue that has been fixed in agent 3.21.0 (from May 2020) as you can see that the PR to solve the matter has the exact same error.