Azure/ARO-RP

must gather race condition

Closed this issue · 2 comments

Saw this in e2e run:

[Admin API] Must gather action 
  should return information collected from a cluster cluster
  /data/vsts-agent/_work/3/s/gopath/src/github.com/Azure/ARO-RP/test/e2e/adminapi_mustgather.go:17
STEP: triggering the mustgather action
time="2020-07-31T18:26:06Z" level=info msg="read request" func="middleware.Log.func1.1()" file="pkg/frontend/middleware/log.go:110" client_principal_name= client_request_id= component=access correlation_id= request_id=c75b3db9-4ddd-4c75-bf44-b11fba357a3e request_method=POST request_path=/admin/subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourcegroups/v4-e2e-rg-v33411082-eastus/providers/microsoft.redhatopenshift/openshiftclusters/v4-e2e-v33411082/mustgather request_proto=HTTP/1.1 request_remote_addr="127.0.0.1:57258" request_user_agent=Go-http-client/1.1 resource_group=v4-e2e-rg-v33411082-eastus resource_id=/subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourcegroups/v4-e2e-rg-v33411082-eastus/providers/microsoft.redhatopenshift/openshiftclusters/v4-e2e-v33411082 resource_name=v4-e2e-v33411082 subscription_id=46626fc5-476d-41ad-8c76-2ec49c6994eb
time="2020-07-31T18:26:08Z" level=info msg="403: Forbidden: pods/must-gather: pods \"must-gather\" is forbidden: error looking up service account openshift-must-gather-v6tf4/default: serviceaccount \"default\" not found" func="frontend.reply()" file="pkg/frontend/frontend.go:375" client_principal_name= client_request_id= component=access correlation_id= request_id=c75b3db9-4ddd-4c75-bf44-b11fba357a3e resource_group=v4-e2e-rg-v33411082-eastus resource_id=/subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourcegroups/v4-e2e-rg-v33411082-eastus/providers/microsoft.redhatopenshift/openshiftclusters/v4-e2e-v33411082 resource_name=v4-e2e-v33411082 subscription_id=46626fc5-476d-41ad-8c76-2ec49c6994eb
time="2020-07-31T18:26:08Z" level=info msg="sent response" func="middleware.Log.func1.1.1()" file="pkg/frontend/middleware/log.go:101" body_read_bytes=0 body_written_bytes=255 client_principal_name= client_request_id= component=access correlation_id= duration=2.2390684419999998 request_id=c75b3db9-4ddd-4c75-bf44-b11fba357a3e request_method=POST request_path=/admin/subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourcegroups/v4-e2e-rg-v33411082-eastus/providers/microsoft.redhatopenshift/openshiftclusters/v4-e2e-v33411082/mustgather request_proto=HTTP/1.1 request_remote_addr="127.0.0.1:57258" request_user_agent=Go-http-client/1.1 resource_group=v4-e2e-rg-v33411082-eastus resource_id=/subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourcegroups/v4-e2e-rg-v33411082-eastus/providers/microsoft.redhatopenshift/openshiftclusters/v4-e2e-v33411082 resource_name=v4-e2e-v33411082 response_status_code=403 subscription_id=46626fc5-476d-41ad-8c76-2ec49c6994eb

• Failure [2.244 seconds]
[Admin API] Must gather action
/data/vsts-agent/_work/3/s/gopath/src/github.com/Azure/ARO-RP/test/e2e/adminapi_mustgather.go:14
  should return information collected from a cluster cluster [It]
  /data/vsts-agent/_work/3/s/gopath/src/github.com/Azure/ARO-RP/test/e2e/adminapi_mustgather.go:17

  Expected
      <int>: 403
  to equal
      <int>: 200

  /data/vsts-agent/_work/3/s/gopath/src/github.com/Azure/ARO-RP/test/e2e/adminapi_mustgather.go:24

The problem is that after creating the openshift-must-gather-XXX namespace, we don't wait for the existence of the default service account before creating the must-gather pod.

It's not a serious bug, but it makes the e2es flaky so would be good to fix as a priority.