Azure/acr

Intermittent 50x http errors when pulling Helm charts

Closed this issue · 5 comments

Describe the bug
When pulling Helm charts (OCI) from ACR we intermittently get the following http status codes:

  • 502 Bad Gateway
  • 503 The server is busy
  • 504 Gateway Time-out

The problem occurs several times per day.
We have tried:

  • Multiple Helm client versions
  • Multiple client networks (AKS, VM both within vnet and outside)
  • Multiple different charts & chart versions
  • Implemented retries wherever possible

Together with support we created client network packet traces that indicate the http status codes are being returned by the ACR endpoint (xyz.azurecr.io) and not some intermediate service/proxy.

To Reproduce
Steps to reproduce the behavior:

  1. Pull Helm charts (OCI) from ACR
  2. Repeat until it fails with a 50x http status code

Any relevant environment information

  • Helm 3.9, 3.10.2, ...
  • Region West-Europe

Additional context
Support advised us to create a ticket in this GitHub repository to be able to track progress.
According to support:

  • PG confirmed the cause of this issue is with storage account in specific regions.
  • PG team is working on the long-term fix in which they have a plan to improve our code to stabilize the connection between ACR and the Storage service

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

Any update on this?

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

This issue was closed because it has been stalled for 30 days with no activity.

Can this please be re-opened? Azure Support asked to create this ticket. It would be nice to get at least some feedback on this. Thanks in advance.