Logs only showing the first few lines of each step
Opened this issue · 3 comments
Component
agent, web-ui
Describe the bug
On all my pipelines only the first few log lines appear in the webui. Then everything else is cut out. I can certainly see the log output when I kubectl logs wp-1234.... the step/pipeline, bit it doesn't seem to get sent to the server from the agent.
I also see a lot of
{"level":"error","repo":"renovatebot/renovate","pipeline":"2064","workflow_id":"6848","image":"docker.io/woodpeckerci/plugin-git:2.5.1","workflow_id":"6848","error":"io: read/write on closed pipe","time":"2024-11-17T13:04:52Z","message":"copy limited logStream part"}
on the agent and
{"level":"error","repo_id":"29","pipeline_id":"4660","workflow_id":"6871","error":"stream: not found","time":"2024-11-18T12:55:30Z","message":"done: cannot close log stream for step 21827"}
and {"level":"error","repo_id":"29","pipeline_id":"4660","workflow_id":"6871","error":"sql: no rows in result set","time":"2024-11-18T12:55:30Z","message":"queue.Done: cannot ack workflow"}
on the server.
It has to be mentioned, that before a month ago (I upgrade woodpecker almost immediately after release) the logs worked almost flawlessly for more than a year, since I switched from drone to woodpecker.
Currently I don't know where to start debugging, to get to the bottom of this. Do you have any pointers?
Steps to reproduce
install woodpecker using the helm chart version 1.6.2 on a kubernetes cluster v1.31 only setting WOODPECKER_BACKEND_K8S_STORAGE_RWX: false beside the setup with a forge using https://woodpecker-ci.org/docs/next/administration/forges/forgejo
Expected behavior
I expect to always get all the logs, when I click on a step/workflow.
System Info
{"source":"https://github.com/woodpecker-ci/woodpecker","version":"2.7.3"}
Additional context
No response
Validations
- Read the docs.
- Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
- Checked that the bug isn't fixed in the
next
version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]
I have this from time to time. Restarting server/agent helps. Perhaps, Kubernetes-only issue.
@zc-devs As you can imagine I've done restarts quite a few times already, but wasn't as lucky as you. As the first lines are always getting from the agent to the server, it can't be a fundamental connection problem, but can be unreliable connections. Under k8 networking is more complex (even though I use flannel) than simple docker, so I can't rule out networking problems in k8 completely. But since I also run loki as a logging solution (where all the pipeline logs get also shipped to) it can't be a throughput problem. It might also be a robustness problem in the logging component in woodpecker. But I'm not confident enough in Go, to debug it on my own.
Wanted to add more context, but you got ahead :)
unreliable connections
This!
I thought about it yesterday and noted, that I have not had this problem since a while. At the time when there was, I had network issues and restarting not always helped, probably. At some point Agent even could not connect to Kube API.
We have two network connections there: Agent -> Kube API, Agent -> Server.
In regard the first - compared to simple/manual watcher, shared informer should handle errors and reconnect.
However, I do not see retry logic in between input stream from Kube informer and output stream to Server:
Lines 46 to 52 in 5c22047
Also wanted to reproduce/test both connections through toxyproxy
with packet drops (#3945), but didn't have time. Agent -> Server part could be tricky as there is gRPC (though HTTP, but 2).