Logs only showing the first few lines of each step

Component

agent, web-ui

Describe the bug

On all my pipelines only the first few log lines appear in the webui. Then everything else is cut out. I can certainly see the log output when I kubectl logs wp-1234.... the step/pipeline, bit it doesn't seem to get sent to the server from the agent.

I also see a lot of
{"level":"error","repo":"renovatebot/renovate","pipeline":"2064","workflow_id":"6848","image":"docker.io/woodpeckerci/plugin-git:2.5.1","workflow_id":"6848","error":"io: read/write on closed pipe","time":"2024-11-17T13:04:52Z","message":"copy limited logStream part"}

on the agent and

{"level":"error","repo_id":"29","pipeline_id":"4660","workflow_id":"6871","error":"stream: not found","time":"2024-11-18T12:55:30Z","message":"done: cannot close log stream for step 21827"} and {"level":"error","repo_id":"29","pipeline_id":"4660","workflow_id":"6871","error":"sql: no rows in result set","time":"2024-11-18T12:55:30Z","message":"queue.Done: cannot ack workflow"}

on the server.

It has to be mentioned, that before a month ago (I upgrade woodpecker almost immediately after release) the logs worked almost flawlessly for more than a year, since I switched from drone to woodpecker.

Currently I don't know where to start debugging, to get to the bottom of this. Do you have any pointers?

Steps to reproduce

install woodpecker using the helm chart version 1.6.2 on a kubernetes cluster v1.31 only setting WOODPECKER_BACKEND_K8S_STORAGE_RWX: false beside the setup with a forge using https://woodpecker-ci.org/docs/next/administration/forges/forgejo

Expected behavior

I expect to always get all the logs, when I click on a step/workflow.

System Info

{"source":"https://github.com/woodpecker-ci/woodpecker","version":"2.7.3"}

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
Checked that the bug isn't fixed in the next version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]

I have this from time to time. Restarting server/agent helps. Perhaps, Kubernetes-only issue.

@zc-devs As you can imagine I've done restarts quite a few times already, but wasn't as lucky as you. As the first lines are always getting from the agent to the server, it can't be a fundamental connection problem, but can be unreliable connections. Under k8 networking is more complex (even though I use flannel) than simple docker, so I can't rule out networking problems in k8 completely. But since I also run loki as a logging solution (where all the pipeline logs get also shipped to) it can't be a throughput problem. It might also be a robustness problem in the logging component in woodpecker. But I'm not confident enough in Go, to debug it on my own.

Wanted to add more context, but you got ahead :)

unreliable connections

This!
I thought about it yesterday and noted, that I have not had this problem since a while. At the time when there was, I had network issues and restarting not always helped, probably. At some point Agent even could not connect to Kube API.

We have two network connections there: Agent -> Kube API, Agent -> Server.
In regard the first - compared to simple/manual watcher, shared informer should handle errors and reconnect.
However, I do not see retry logic in between input stream from Kube informer and output stream to Server:

woodpecker/agent/logger.go

Lines 46 to 52 in 5c22047

    
           logStream := log.NewLineWriter(r.client, step.UUID, secrets...) 
        
           if err := log.CopyLineByLine(logStream, rc, pipeline.MaxLogLineLength); err != nil { 
        
           	logger.Error().Err(err).Msg("copy limited logStream part") 
        
           } 
        
           logger.Debug().Msg("log stream copied, close ...") 
        
           uploads.Done()

Also wanted to reproduce/test both connections through toxyproxy with packet drops (#3945), but didn't have time. Agent -> Server part could be tricky as there is gRPC (though HTTP, but 2).

	logStream := log.NewLineWriter(r.client, step.UUID, secrets...)
	if err := log.CopyLineByLine(logStream, rc, pipeline.MaxLogLineLength); err != nil {
	logger.Error().Err(err).Msg("copy limited logStream part")
	}

	logger.Debug().Msg("log stream copied, close ...")
	uploads.Done()