Pods are running but registry is unresponsive at some point after installation
Opened this issue · 2 comments
All the pods are running but registry server is unresponsive at some point after installation.
(no response at curl https://localhost:8443
)
I have to restart the pods or even have to reboot the host to get it working.
All the pods are running:
[root@bastion ~]# podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
db266da38b9c registry.access.redhat.com/ubi8/pause:8.7-6 infinity 13 hours ago Up 13 hours 0.0.0.0:8443->8443/tcp 5e70ee01733b-infra
767d8f665354 registry.redhat.io/rhel8/redis-6:1-92.1669834635 run-redis 13 hours ago Up 13 hours 0.0.0.0:8443->8443/tcp quay-redis
73b03983db2f registry.redhat.io/rhel8/postgresql-10:1-203.1669834630 run-postgresql 13 hours ago Up 13 hours 0.0.0.0:8443->8443/tcp quay-postgres
41c21e84bb3e registry.redhat.io/quay/quay-rhel8:v3.8.14 registry 13 hours ago Up 13 hours 0.0.0.0:8443->8443/tcp quay-app
New logs are comming up, so the containers are running fine... I guess?
[root@bastion ~]# podman logs --tail=10 -f quay-app
exportactionlogsworker stdout | 2024-03-26 00:28:00,067 [52] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:00 UTC)" (scheduled at 2024-03-26 00:28:00.067443+00:00)
exportactionlogsworker stdout | 2024-03-26 00:28:00,071 [52] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:00 UTC)" executed successfully
notificationworker stdout | 2024-03-26 00:28:04,724 [63] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:14 UTC)" (scheduled at 2024-03-26 00:28:04.724010+00:00)
notificationworker stdout | 2024-03-26 00:28:04,727 [63] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:14 UTC)" executed successfully
repositorygcworker stdout | 2024-03-26 00:28:11,768 [75] [INFO] [apscheduler.executors.default] Running job "QueueWorker.run_watchdog (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:11 UTC)" (scheduled at 2024-03-26 00:28:11.767795+00:00)
repositorygcworker stdout | 2024-03-26 00:28:11,769 [75] [INFO] [apscheduler.executors.default] Job "QueueWorker.run_watchdog (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:11 UTC)" executed successfully
gcworker stdout | 2024-03-26 00:28:12,861 [53] [INFO] [apscheduler.executors.default] Running job "GarbageCollectionWorker._garbage_collection_repos (trigger: interval[0:00:30], next run at: 2024-03-26 00:28:42 UTC)" (scheduled at 2024-03-26 00:28:12.860612+00:00)
gcworker stdout | 2024-03-26 00:28:12,868 [53] [INFO] [apscheduler.executors.default] Job "GarbageCollectionWorker._garbage_collection_repos (trigger: interval[0:00:30], next run at: 2024-03-26 00:28:42 UTC)" executed successfully
notificationworker stdout | 2024-03-26 00:28:14,724 [63] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:24 UTC)" (scheduled at 2024-03-26 00:28:14.724010+00:00)
notificationworker stdout | 2024-03-26 00:28:14,731 [63] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:24 UTC)" executed successfully
Nothing strange on the quay-app
container deatails.
[root@bastion ~]# podman inspect quay-app
[
{
"Id": "41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389",
"Created": "2024-03-25T07:50:17.451450987-04:00",
"Path": "dumb-init",
"Args": [
"--",
"/quay-registry/quay-entrypoint.sh",
"registry"
],
"State": {
"OciVersion": "1.1.0-rc.3",
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 7577,
"ConmonPid": 7575,
"ExitCode": 0,
"Error": "",
"StartedAt": "2024-03-25T07:50:17.61683645-04:00",
"FinishedAt": "0001-01-01T00:00:00Z",
"Health": {
"Status": "",
"FailingStreak": 0,
"Log": null
},
"CgroupPath": "/machine.slice/machine-libpod_pod_5e70ee01733b02f854d79d85dd78dc5c8ecdb2c50de7472a314441897f9296dc.slice/libpod-41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389.scope",
"CheckpointedAt": "0001-01-01T00:00:00Z",
"RestoredAt": "0001-01-01T00:00:00Z"
},
"Image": "93b30dda302e3554fcfea484da1fc7b981dc4ac173b195def4ab79b86dfaf616",
"ImageDigest": "sha256:19e0709632a860dc93e54e9d79b8da9b02334122775932eaefaccf4783524ef4",
"ImageName": "registry.redhat.io/quay/quay-rhel8:v3.8.14",
"Rootfs": "",
"Pod": "5e70ee01733b02f854d79d85dd78dc5c8ecdb2c50de7472a314441897f9296dc",
"ResolvConfPath": "/run/containers/storage/overlay-containers/db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec/userdata/resolv.conf",
"HostnamePath": "/run/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata/hostname",
"HostsPath": "/run/containers/storage/overlay-containers/db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec/userdata/hosts",
"StaticDir": "/var/lib/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata",
"OCIConfigPath": "/var/lib/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata/config.json",
"OCIRuntime": "crun",
"ConmonPidFile": "/run/quay-app.service-pid",
"PidFile": "/run/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata/pidfile",
"Name": "quay-app",
"RestartCount": 0,
"Driver": "overlay",
"MountLabel": "system_u:object_r:container_file_t:s0:c273,c984",
"ProcessLabel": "system_u:system_r:container_t:s0:c273,c984",
"AppArmorProfile": "",
"EffectiveCaps": null,
"BoundingCaps": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FOWNER",
"CAP_FSETID",
"CAP_KILL",
"CAP_NET_BIND_SERVICE",
"CAP_SETFCAP",
"CAP_SETGID",
"CAP_SETPCAP",
"CAP_SETUID",
"CAP_SYS_CHROOT"
],
"ExecIDs": [],
"GraphDriver": {
"Name": "overlay",
"Data": {
"LowerDir": "/var/lib/containers/storage/overlay/19dbf084110759a3d249cd4ec487e83f55eca64deafc5d51d04787a3716fadb8/diff",
"MergedDir": "/var/lib/containers/storage/overlay/fc1f2d2a88e454e8c41e3aa22e5d91e18001506f13821dd60eee47a918b1bc50/merged",
"UpperDir": "/var/lib/containers/storage/overlay/fc1f2d2a88e454e8c41e3aa22e5d91e18001506f13821dd60eee47a918b1bc50/diff",
"WorkDir": "/var/lib/containers/storage/overlay/fc1f2d2a88e454e8c41e3aa22e5d91e18001506f13821dd60eee47a918b1bc50/work"
}
},
"Mounts": [
{
"Type": "volume",
"Name": "f19507ef7f837c63cb92f116e042f12daa4c00a0c37c444cb1c7988687e66a0d",
"Source": "/var/lib/containers/storage/volumes/f19507ef7f837c63cb92f116e042f12daa4c00a0c37c444cb1c7988687e66a0d/_data",
"Destination": "/tmp",
"Driver": "local",
"Mode": "",
"Options": [
"nodev",
"exec",
"nosuid",
"rbind"
],
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "volume",
"Name": "63e0413f366aa2f74f9370d04014e48038006bb4cf1b2ff5435fc9cb724de3ce",
"Source": "/var/lib/containers/storage/volumes/63e0413f366aa2f74f9370d04014e48038006bb4cf1b2ff5435fc9cb724de3ce/_data",
"Destination": "/var/log",
"Driver": "local",
"Mode": "",
"Options": [
"nodev",
"exec",
"nosuid",
"rbind"
],
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "volume",
"Name": "097a7e8bf2e6d0a80a575d14bd6bdfa58d16919ff83a9b403d6dc06915ae20bc",
"Source": "/var/lib/containers/storage/volumes/097a7e8bf2e6d0a80a575d14bd6bdfa58d16919ff83a9b403d6dc06915ae20bc/_data",
"Destination": "/conf/stack",
"Driver": "local",
"Mode": "",
"Options": [
"nodev",
"exec",
"nosuid",
"rbind"
],
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/opt/quay/config/quay-config",
"Destination": "/quay-registry/conf/stack",
"Driver": "",
"Mode": "",
"Options": [
"rbind"
],
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/opt/quay/data",
"Destination": "/datastorage",
"Driver": "",
"Mode": "",
"Options": [
"rbind"
],
"RW": true,
"Propagation": "rprivate"
}
],
"Dependencies": [
"db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec"
],
"NetworkSettings": {
"EndpointID": "",
"Gateway": "10.88.0.1",
"IPAddress": "10.88.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "a6:9c:af:e1:1b:a7",
"Bridge": "",
"SandboxID": "",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {
"8443/tcp": [
{
"HostIp": "",
"HostPort": "8443"
}
]
},
"SandboxKey": "/run/netns/netns-67bc251f-bac0-1817-c280-f49b54fda5bc",
"Networks": {
"podman": {
"EndpointID": "",
"Gateway": "10.88.0.1",
"IPAddress": "10.88.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "a6:9c:af:e1:1b:a7",
"NetworkID": "podman",
"DriverOpts": null,
"IPAMConfig": null,
"Links": null,
"Aliases": [
"db266da38b9c",
"quay-pod"
]
}
}
},
"Namespace": "",
"IsInfra": false,
"IsService": false,
"KubeExitCodePropagation": "invalid",
"lockNumber": 37,
"Config": {
"Hostname": "quay-pod",
"Domainname": "",
"User": "1001",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"LANG=C.UTF-8",
"QUAYDIR=/quay-registry",
"PYTHONUNBUFFERED=1",
"RED_HAT_QUAY=true",
"TERM=xterm",
"container=oci",
"PYTHONIOENCODING=UTF-8",
"LC_ALL=C.UTF-8",
"TZ=UTC",
"PYTHONUSERBASE=/app",
"QUAYPATH=/quay-registry",
"QUAYCONF=/quay-registry/conf",
"PATH=/app/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"QUAYRUN=/quay-registry/conf",
"PYTHONPATH=/quay-registry",
"HOME=/quay-registry",
"HOSTNAME=quay-pod"
],
"Cmd": [
"registry"
],
"Image": "registry.redhat.io/quay/quay-rhel8:v3.8.14",
"Volumes": null,
"WorkingDir": "/quay-registry",
"Entrypoint": "dumb-init -- /quay-registry/quay-entrypoint.sh",
"OnBuild": null,
"Labels": null,
"Annotations": {
"io.container.manager": "libpod",
"io.kubernetes.cri-o.SandboxID": "db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
"io.podman.annotations.cid-file": "/run/quay-app.service-cid",
"org.opencontainers.image.stopSignal": "15"
},
"StopSignal": 15,
"HealthcheckOnFailureAction": "none",
"CreateCommand": [
"/usr/bin/podman",
"run",
"--name",
"quay-app",
"-v",
"/opt/quay/config/quay-config:/quay-registry/conf/stack:Z",
"-v",
"/opt/quay/data:/datastorage:Z",
"--pod=quay-pod",
"--conmon-pidfile",
"/run/quay-app.service-pid",
"--cidfile",
"/run/quay-app.service-cid",
"--cgroups=no-conmon",
"--replace",
"registry.redhat.io/quay/quay-rhel8:v3.8.14"
],
"Umask": "0022",
"Timeout": 0,
"StopTimeout": 10,
"Passwd": true,
"sdNotifyMode": "container"
},
"HostConfig": {
"Binds": [
"f19507ef7f837c63cb92f116e042f12daa4c00a0c37c444cb1c7988687e66a0d:/tmp:rprivate,rw,nodev,exec,nosuid,rbind",
"63e0413f366aa2f74f9370d04014e48038006bb4cf1b2ff5435fc9cb724de3ce:/var/log:rprivate,rw,nodev,exec,nosuid,rbind",
"097a7e8bf2e6d0a80a575d14bd6bdfa58d16919ff83a9b403d6dc06915ae20bc:/conf/stack:rprivate,rw,nodev,exec,nosuid,rbind",
"/opt/quay/config/quay-config:/quay-registry/conf/stack:rw,rprivate,rbind",
"/opt/quay/data:/datastorage:rw,rprivate,rbind"
],
"CgroupManager": "systemd",
"CgroupMode": "private",
"ContainerIDFile": "/run/quay-app.service-cid",
"LogConfig": {
"Type": "journald",
"Config": null,
"Path": "",
"Tag": "",
"Size": "0B"
},
"NetworkMode": "container:db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
"PortBindings": {},
"RestartPolicy": {
"Name": "",
"MaximumRetryCount": 0
},
"AutoRemove": false,
"VolumeDriver": "",
"VolumesFrom": null,
"CapAdd": [],
"CapDrop": [],
"Dns": [],
"DnsOptions": [],
"DnsSearch": [],
"ExtraHosts": [],
"GroupAdd": [],
"IpcMode": "container:db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
"Cgroup": "",
"Cgroups": "default",
"Links": null,
"OomScoreAdj": 0,
"PidMode": "private",
"Privileged": false,
"PublishAllPorts": false,
"ReadonlyRootfs": false,
"SecurityOpt": [],
"Tmpfs": {},
"UTSMode": "container:db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
"UsernsMode": "",
"ShmSize": 65536000,
"Runtime": "oci",
"ConsoleSize": [
0,
0
],
"Isolation": "",
"CpuShares": 0,
"Memory": 0,
"NanoCpus": 0,
"CgroupParent": "machine.slice/machine-libpod_pod_5e70ee01733b02f854d79d85dd78dc5c8ecdb2c50de7472a314441897f9296dc.slice",
"BlkioWeight": 0,
"BlkioWeightDevice": null,
"BlkioDeviceReadBps": null,
"BlkioDeviceWriteBps": null,
"BlkioDeviceReadIOps": null,
"BlkioDeviceWriteIOps": null,
"CpuPeriod": 0,
"CpuQuota": 0,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": [],
"DiskQuota": 0,
"KernelMemory": 0,
"MemoryReservation": 0,
"MemorySwap": 0,
"MemorySwappiness": 0,
"OomKillDisable": false,
"PidsLimit": 2048,
"Ulimits": [
{
"Name": "RLIMIT_NPROC",
"Soft": 4194304,
"Hard": 4194304
}
],
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0,
"CgroupConf": null
}
}
]
Hey team, we just ran into this same exact issue, same symptoms as well. I thought perhaps we just had a one-off issue, but then noticed this issue, so I thought I'd add a comment. I'll get some troubleshooting logs posted here. I can connect via netcat to port 8443
and have ruled out selinux, fapolicyd, etc as potential contributors.
It just.... stops responding to http traffic.
I should have captured the output, but failed to - I did notice that a curl
results in something similar to the following:
curl -vvv https://<quay-server>:8443 | head
* Rebuilt URL to: https://<quay-server>:8443/
* TCP_NODELAY set
* Connected to <quay-server> port 8443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
< hangs right here where we should get a Server hello>
We never get the server hello back, nor anything beyond that - and, as noted above the port is open and responds via nc
and the logs keep on rolling by for journalctl -fu quay-app.service
or podman logs -f <pod_id>