distribution/distribution

Registry failed on storage health check - s3aws.Stat("/")

maka86 opened this issue · 4 comments

Description

We had outage for existing docker registry server (2.8.3), now We just built a new docker registry:2.8.3 with same settings in EC2, however, we hit the following issues.

When container starts, passed ALB health check then failed on s3aws.Stat("/")

The log when container starts, pass the ELB health check, check path registry:5000/v2/

10.21.11.226 - - [21/May/2024:06:11:23 +0000] "GET /v2/ HTTP/1.1" 200 2 "" "ELB-HealthChecker/2.0"
time="2024-05-21T06:11:33.612124577Z" level=info msg="response completed" go.version=go1.20.8 http.request.host="localhost:5000" http.request.id=32086d61-5859-4c89-bdfb-436722cffc39 http.request.method=GET http.request.remoteaddr="127.0.0.1:47768" http.request.uri="/v2/" http.request.useragent=Wget http.response.contenttype="application/json; charset=utf-8" http.response.duration=2.09839ms http.response.status=200 http.response.written=2
time="2024-05-21T06:11:33.612046526Z" level=debug msg="authorizing request" go.version=go1.20.8 http.request.host="localhost:5000" http.request.id=32086d61-5859-4c89-bdfb-436722cffc39 http.request.method=GET http.request.remoteaddr="127.0.0.1:47768" http.request.uri="/v2/" http.request.useragent=Wget

Registry starts failed, ELB health check 503

time="2024-05-21T06:11:33.831893432Z" level=debug msg="s3aws.Stat("/")" go.version=go1.20.8 instance.id=e8c2d6e3-d6bf-4e88-a9cc-c1ec8d88c53a service=registry trace.duration=350.466167ms trace.file="github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=e10513a0-4a84-4296-b7c5-9144c28cf351 trace.line=155 version=2.8.3
10.21.11.226 - - [21/May/2024:06:11:38 +0000] "GET /v2/ HTTP/1.1" 503 125 "" "ELB-HealthChecker/2.0"

If we set env variable
REGISTRY_HEALTH_STORAGEDRIVER_ENABLED: "false"

The ELB health check is passed, but it shows this error
$ curl -X GET https://registry.sanbox.internal.com:5000/v2/images/node/tags/list {"errors":[{"code":"UNKNOWN","message":"unknown error","detail":{"DriverName":"s3aws","Enclosed":{}}}]}

Reproduce

1: start container without "REGISTRY_HEALTH_STORAGEDRIVER_ENABLED: "false", registry service failed on s3aws.Stat("/")
2: start container with "REGISTRY_HEALTH_STORAGEDRIVER_ENABLED: "false", registry service return
{"errors":[{"code":"UNKNOWN","message":"unknown error","detail":{"DriverName":"s3aws","Enclosed":{}}}]}

Expected behavior

registry:2.8.3 should be run as normal.

registry version

registry:2.8.3

Additional Info

The docker compose file:

registry-service:
    image: registry:2
    container_name: docker-registry
    restart: always
    ports:
      - '5000:5000'
    environment:
       REGISTRY_STORAGE: s3
       REGISTRY_STORAGE_S3_BUCKET: docker-image-repo
       REGISTRY_STORAGE_S3_REGION: ap-southeast-2
       REGISTRY_STORAGE_S3_ENCRYPT: 'true'
       SEARCH_BACKEND: sqlalchemy
       REGISTRY_HTTP_SECRET: <secret>

Searched a bit, it might be caused by s3 permission, but the execution role has full s3 access.

We have to use 2.8.3 for now to support many old images

I would really appreciate if you formatted your logs using markdown code formatting. It's really hard to parse the context from your message.

Hi @milosgajdos sorry about that, I have updated the log block. Thanks

@milosgajdos, I just did a small test to prove the container can access s3 bucket

run the following sdk to access s3 in container,

 import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('docker-image-repo')

for obj in bucket.objects.all():
    print(obj.key)

result:

docker/registry/v2/blobs/sha256/08/084c1da10d39c6b7dc2bb41ba84771e6c0a60611ca17e493b5ad258cae7b7eb5/data
docker/registry/v2/blobs/sha256/08/084c81f08dd1abd2e5530390610303de2165ee5e4b49c4718d5168921bca76b9/data
docker/registry/v2/blobs/sha256/08/084cae691009e79856b8948aa70d437d649a39ecd1376b3ffa521b123f168aca/data
docker/registry/v2/blobs/sha256/08/084cb812b096e26793403a078ca871e6f3eec2f6f6bfe8e62d54b3f35fc84e4e/data
docker/registry/v2/blobs/sha256/08/084d0db3995e0a48bff875f01d4322a1d215bd31c84102faaec6f85bf20a9311/data
docker/registry/v2/blobs/sha256/08/084d379991bd2eaa55e2dd1143dcc9e5a1e8b5ca2a7837e5b906b402f04bc8d5/data
......
......

Container can access the s3 bucket. does this mean the issue is from s3aws storage driver ?

I have a feeling this is related to #3275.

Note that v2.8.3 is essentially in a maintenance mode and won't be receiving any updates besides high security patches. When stable v3 release is out v2.x will be marked as deprecated completely.