Registry failed on storage health check - s3aws.Stat("/")
maka86 opened this issue · 4 comments
Description
We had outage for existing docker registry server (2.8.3), now We just built a new docker registry:2.8.3 with same settings in EC2, however, we hit the following issues.
When container starts, passed ALB health check then failed on s3aws.Stat("/")
The log when container starts, pass the ELB health check, check path registry:5000/v2/
10.21.11.226 - - [21/May/2024:06:11:23 +0000] "GET /v2/ HTTP/1.1" 200 2 "" "ELB-HealthChecker/2.0"
time="2024-05-21T06:11:33.612124577Z" level=info msg="response completed" go.version=go1.20.8 http.request.host="localhost:5000" http.request.id=32086d61-5859-4c89-bdfb-436722cffc39 http.request.method=GET http.request.remoteaddr="127.0.0.1:47768" http.request.uri="/v2/" http.request.useragent=Wget http.response.contenttype="application/json; charset=utf-8" http.response.duration=2.09839ms http.response.status=200 http.response.written=2
time="2024-05-21T06:11:33.612046526Z" level=debug msg="authorizing request" go.version=go1.20.8 http.request.host="localhost:5000" http.request.id=32086d61-5859-4c89-bdfb-436722cffc39 http.request.method=GET http.request.remoteaddr="127.0.0.1:47768" http.request.uri="/v2/" http.request.useragent=Wget
Registry starts failed, ELB health check 503
time="2024-05-21T06:11:33.831893432Z" level=debug msg="s3aws.Stat("/")" go.version=go1.20.8 instance.id=e8c2d6e3-d6bf-4e88-a9cc-c1ec8d88c53a service=registry trace.duration=350.466167ms trace.file="github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=e10513a0-4a84-4296-b7c5-9144c28cf351 trace.line=155 version=2.8.3
10.21.11.226 - - [21/May/2024:06:11:38 +0000] "GET /v2/ HTTP/1.1" 503 125 "" "ELB-HealthChecker/2.0"
If we set env variable
REGISTRY_HEALTH_STORAGEDRIVER_ENABLED: "false"
The ELB health check is passed, but it shows this error
$ curl -X GET https://registry.sanbox.internal.com:5000/v2/images/node/tags/list {"errors":[{"code":"UNKNOWN","message":"unknown error","detail":{"DriverName":"s3aws","Enclosed":{}}}]}
Reproduce
1: start container without "REGISTRY_HEALTH_STORAGEDRIVER_ENABLED: "false", registry service failed on s3aws.Stat("/")
2: start container with "REGISTRY_HEALTH_STORAGEDRIVER_ENABLED: "false", registry service return
{"errors":[{"code":"UNKNOWN","message":"unknown error","detail":{"DriverName":"s3aws","Enclosed":{}}}]}
Expected behavior
registry:2.8.3 should be run as normal.
registry version
registry:2.8.3
Additional Info
The docker compose file:
registry-service:
image: registry:2
container_name: docker-registry
restart: always
ports:
- '5000:5000'
environment:
REGISTRY_STORAGE: s3
REGISTRY_STORAGE_S3_BUCKET: docker-image-repo
REGISTRY_STORAGE_S3_REGION: ap-southeast-2
REGISTRY_STORAGE_S3_ENCRYPT: 'true'
SEARCH_BACKEND: sqlalchemy
REGISTRY_HTTP_SECRET: <secret>
Searched a bit, it might be caused by s3 permission, but the execution role has full s3 access.
We have to use 2.8.3 for now to support many old images
I would really appreciate if you formatted your logs using markdown code formatting. It's really hard to parse the context from your message.
Hi @milosgajdos sorry about that, I have updated the log block. Thanks
@milosgajdos, I just did a small test to prove the container can access s3 bucket
run the following sdk to access s3 in container,
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('docker-image-repo')
for obj in bucket.objects.all():
print(obj.key)
result:
docker/registry/v2/blobs/sha256/08/084c1da10d39c6b7dc2bb41ba84771e6c0a60611ca17e493b5ad258cae7b7eb5/data
docker/registry/v2/blobs/sha256/08/084c81f08dd1abd2e5530390610303de2165ee5e4b49c4718d5168921bca76b9/data
docker/registry/v2/blobs/sha256/08/084cae691009e79856b8948aa70d437d649a39ecd1376b3ffa521b123f168aca/data
docker/registry/v2/blobs/sha256/08/084cb812b096e26793403a078ca871e6f3eec2f6f6bfe8e62d54b3f35fc84e4e/data
docker/registry/v2/blobs/sha256/08/084d0db3995e0a48bff875f01d4322a1d215bd31c84102faaec6f85bf20a9311/data
docker/registry/v2/blobs/sha256/08/084d379991bd2eaa55e2dd1143dcc9e5a1e8b5ca2a7837e5b906b402f04bc8d5/data
......
......
Container can access the s3 bucket. does this mean the issue is from s3aws storage driver ?
I have a feeling this is related to #3275.
Note that v2.8.3
is essentially in a maintenance mode and won't be receiving any updates besides high security patches. When stable v3
release is out v2.x
will be marked as deprecated completely.