buildkite/buildkite-agent-metrics

Confusing error with invalid token

jeffora opened this issue · 2 comments

I recently ran into an issue after deploying the Elastic CI Stack via terraform. As the stack specifies the token parameter as NoEcho, terraform highlights it as a change every plan / execute. I set that to be ignored, which resulted in terraform updating the stack with the token value "****".

When the metrics lambda queried the buildkite API, it was obviously rejected due to an auth issue. However, instead of getting a log message indicating this from the lambda logs, I received the following logs:

2019-03-07 17:47:46
START RequestId: d5e7b957-08a4-4ab2-8549-b3a0b480c4e2 Version: $LATEST
2019/03/07 17:47:46 Collecting agent metrics for queue 'default'
No organization slug was found in the metrics response: errorString null
END RequestId: d5e7b957-08a4-4ab2-8549-b3a0b480c4e2
REPORT RequestId: d5e7b957-08a4-4ab2-8549-b3a0b480c4e2	Duration: 237.51 ms	Billed Duration: 300 ms Memory Size: 128 MB	Max Memory Used: 58 MB	

This originates from this line of the collector:

return nil, fmt.Errorf("No organization slug was found in the metrics response")

It seems like the API call did not trigger the error case in this scenario, so the body is assumed to be valid, but is then missing information the collector expects.

Here is a sample cURL:

curl -H 'Authorization: Token ****' https://agent.buildkite.com/v3/metrics
< HTTP/2 401
< date: Fri, 08 Mar 2019 00:00:15 GMT
< content-type: application/json; charset=utf-8
< server: nginx
< x-frame-options: SAMEORIGIN
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< x-download-options: noopen
< x-permitted-cross-domain-policies: none
< referrer-policy: strict-origin-when-cross-origin
< www-authenticate: Basic realm="agent.buildkite.com"
< cache-control: no-cache
< x-request-id: 8a87a234-5c14-44da-b003-33df020f0a52
< x-runtime: 0.007021
<
{
  "message": "The token you provided \"****\" is not a valid organization agent registration token"
}

It seems a status 401 does not result in the err being populated via the go httpClient, but this is just a guess!

lox commented

You're right! Sorry about that. We will get that fixed.