Agent stops taking jobs after server throws 5XX errors
Opened this issue · 4 comments
Component
agent
Describe the bug
When the server (running in kubernetes) restarts my docker agent refuses to take new jobs until restarted. In the agent logs I can see several 5XX Errors while the server reboots. After that the agent shows as online in the UI but does not take jobs.
Agent logs: See below
Steps to reproduce
- Install Woodpecker server in Kubernetes
- Install agent in seperate server using docker
- Kill the server so that it recreates
- Trigger pipeline that would use the docker agent
- See it pending
Expected behavior
The agent should properly reconnect to the Server via gRPC after the server restarts.
System Info
Server:
{"source":"https://github.com/woodpecker-ci/woodpecker","version":"2.7.3"}
Helm values:
---
server:
ingress:
# -- Enable the ingress for the server component
enabled: true
# -- Add annotations to the ingress
annotations:
# kubernetes.io/ingress.class: nginx
kubernetes.io/tls-acme: "true"
hosts:
- host: woodpecker.example.com
paths:
- path: /
backend:
serviceName: woodpecker-svc
servicePort: 80
tls:
- hosts:
- woodpecker.example.com
secretName: woodpecker-tls-key
statefulSet:
replicaCount: 1
env:
WOODPECKER_ADMIN: 'aaron'
WOODPECKER_HOST: 'https://woodpecker.example.com'
WOODPECKER_OPEN: true
WOODPECKER_FORGEJO: true
WOODPECKER_FORGEJO_URL: 'https://git.example.com'
WOODPECKER_LOG_LEVEL: "error"
extraSecretNamesForEnvFrom:
- woodpecker-forgejo
gRPC Ingress:
---
apiVersion: v1
kind: Service
metadata:
name: woodpecker-grpc
namespace: woodpecker
annotations:
traefik.ingress.kubernetes.io/service.serversscheme: h2c
spec:
selector:
app.kubernetes.io/instance: woodpecker
app.kubernetes.io/name: server
ports:
- name: grpc
protocol: TCP
port: 9000
targetPort: grpc
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/tls-acme: "true"
traefik.ingress.kubernetes.io/loadbalancer.server.scheme: h2c
traefik.ingress.kubernetes.io/service.serversscheme: h2c
name: woodpecker-grpc
namespace: woodpecker
spec:
rules:
- host: "woodpecker-grpc.apps.example.com"
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: woodpecker-grpc
port:
name: grpc
tls:
- hosts:
- woodpecker-grpc.apps.example.com
secretName: woodpecker-grpc-tls-key
docker-compose config for agent:
services:
woodpecker-agent-1:
container_name: woodpecker-agent-1
image: woodpeckerci/woodpecker-agent:latest
command: agent
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- WOODPECKER_SERVER=woodpecker-grpc.apps.example.com:443
- WOODPECKER_AGENT_SECRET=${WOODPECKER_AGENT_SECRET}
- WOODPECKER_MAX_WORKFLOWS=4
- WOODPECKER_FILTER_LABELS="backend=docker"
- WOODPECKER_BACKEND_DOCKER_ENABLE_IPV6=true
- WOODPECKER_GRPC_SECURE=true
- WOODPECKER_GRPC_VERIFY=true
labels:
- "com.centurylinklabs.watchtower.enable=true"
Additional context
Agent logs:
{"level":"info","time":"2024-11-23T08:44:52Z","message":"starting Woodpecker agent with version '2.7.3' and backend 'docker' using platform 'linux/amd64' running up to 4 pipelines in parallel"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:26:59Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:00Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:01Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:02Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:04Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:06Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:12Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:19Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"error","error":"rpc error: code = Unknown desc = unexpected HTTP status code received from server: 500 (Internal Server Error); malformed header: missing HTTP content-type","time":"2024-11-23T14:27:21Z","message":"grpc error: next(): code: Unknown"}
{"level":"error","error":"rpc error: code = Unknown desc = unexpected HTTP status code received from server: 500 (Internal Server Error); malformed header: missing HTTP content-type","time":"2024-11-23T14:27:21Z","message":"runner done with error"}
{"level":"error","error":"rpc error: code = Unknown desc = unexpected HTTP status code received from server: 500 (Internal Server Error); malformed header: missing HTTP content-type","time":"2024-11-23T14:27:21Z","message":"grpc error: next(): code: Unknown"}
{"level":"error","error":"rpc error: code = Unknown desc = unexpected HTTP status code received from server: 500 (Internal Server Error); malformed header: missing HTTP content-type","time":"2024-11-23T14:27:21Z","message":"runner done with error"}
{"level":"error","error":"rpc error: code = Unknown desc = unexpected HTTP status code received from server: 500 (Internal Server Error); malformed header: missing HTTP content-type","time":"2024-11-23T14:27:21Z","message":"grpc error: next(): code: Unknown"}
{"level":"error","error":"rpc error: code = Unknown desc = unexpected HTTP status code received from server: 500 (Internal Server Error); malformed header: missing HTTP content-type","time":"2024-11-23T14:27:21Z","message":"runner done with error"}
{"level":"error","error":"rpc error: code = Unknown desc = unexpected HTTP status code received from server: 500 (Internal Server Error); malformed header: missing HTTP content-type","time":"2024-11-23T14:27:21Z","message":"grpc error: next(): code: Unknown"}
{"level":"error","error":"rpc error: code = Unknown desc = unexpected HTTP status code received from server: 500 (Internal Server Error); malformed header: missing HTTP content-type","time":"2024-11-23T14:27:21Z","message":"runner done with error"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:24Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:34Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:39Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:53Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:00Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:15Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:29Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:40Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:54Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:29:02Z","message":"grpc error: report_health(): code: Unavailable"}
Validations
- Read the docs.
- Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
- Checked that the bug isn't fixed in the
next
version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]
Does it work if you deploy an agent in Kubernetes (direct Agent-Server connection, not via Traefik)?
JFYI, that is my IngressRoute
, which worked a couple of months ago:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: woodpecker-server
spec:
entryPoints:
- websecure
routes:
- kind: Rule
match: Host(`wp.domain.tld`)
services:
- name: woodpecker-server
port: http
- kind: Rule
match: Host(`wp.domain.tld`) && Headers(`Content-Type`, `application/grpc`)
services:
- name: woodpecker-server
port: grpc
scheme: h2c
However, I didn't restarted the server, if I remember correctly.
The kubernetes-agents work fine and are not affected by the problem. It is very likely that the 5XX errors come from Traefik mainly. However I would also expect the agent to not poop itself when there are errors for a few seconds.
Matching the application type is a good hint, I might implement this. I currently don't use IngressRoute objects and instead configure normal Ingresses with annotations.
received unexpected content-type "text/plain; charset=utf-8""
errors come from Traefik
I think so and I had this.
The agent should properly reconnect
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:24Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:34Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:39Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:53Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:00Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:15Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:29Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:40Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:54Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:29:02Z","message":"grpc error: report_health(): code: Unavailable"}
Seems, it is trying.
Do you have 2 ingresses: one for HTTP, another for gRPC? Could you show HTTP one?
Accidentally added the label. Can't remove it anymore :/