ocp-power-automation/ocp4-upi-kvm

docker.io ratelimit causes OCS CI to break

gitsridhar opened this issue · 5 comments

OCS code in ocs-upi-kvm uses ocp4-upi-kvm to create OCP on libvirt. The master and worker nodes are named as master-X and worker-X .

[svenkat@nx123-ahv ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
master-0 Ready master 4h v1.19.0+d59ce34
master-1 Ready master 3h59m v1.19.0+d59ce34
master-2 Ready master 3h59m v1.19.0+d59ce34
worker-0 Ready worker 3h52m v1.19.0+d59ce34
worker-1 Ready worker 3h52m v1.19.0+d59ce34
worker-2 Ready worker 3h51m v1.19.0+d59ce34
[svenkat@nx123-ahv ~]$

With this setup, OCS is deployed using OCS-CI and during post-deploy validation, performs the creation of an docker.io/Nginx image-based pod. This is done repeatedly many hundreds of times by OCS-CI code.

Recently docker.io has implemented a limit on the number of images pull and since OCS-CI uses anonymous/unauthenticated access go docker.io, the limit is hit and causes COS-CI to fail.

The failure in creating pod:

Warning Failed kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = Error reading manifest latest in docker.io/library/nginx: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
Warning Failed kubelet Error: ErrImagePull

One of the remedies for this problem is to name the master and worker nodes not as shown above, but with a unique name, with a randomly generated string attached as a suffix to it. By doing this, the limit may not be hit across OCS-CI implementations (we do have about 4 of them at this point).

If there is any other solution for this docker.io rate limit problem, please let us know.

I am working to make VM name with a random string suffix. I will create a PR.

Raised a pull request #70
I did not test this change, I do not have an environment ready. In my ocs-upi-kvm enviroment git submodules are getting wiped out, so no idea how I can test this in my environment.

Closing this issue as we already support adding random hex in cluster_id.

If you think we still need to change the node names then ensure proper testing and changes in the code will be needed for hostnames, dhcp and dns configs.

to avoid docker rate limiting, it's advisable to start using quay.io or other registries