Logs are only appearing once pod is completed

Question

Logs are only appearing once pod is completed

lucienbanholzer opened this issue 2 years ago · 12 comments

Dear support team,

I think this is not an issue at all but I don't find how to configure this...
Is it possible to see container logs in live from tekton dashboard ?
Currently I see this until the task is over.

Thanks a lot for your help,

BR

Answer 1 · 2023-03-15T17:56:17.000Z

Hi @lucienbanholzer, can you share more information about your environment and the Pipeline / Task you're seeing this behaviour with? The expected behaviour is that logs show up as soon as they're available, so if a step is outputting logs while it's running they should be displayed in the Dashboard within a few seconds normally.

What Tekton Pipelines and Dashboard versions are you using? How did you install them?
Does this happen every time? Only with specific Task / Pipeline?
When it happens do you observe the same behaviour calling kubectl logs -f <pod> -c <step-container> directly? (it's essentially calling the same API under the hood)
How are you accessing the Dashboard? e.g. via an Ingress, kubectl port-forward, kubectl proxy, etc.

Answer 2 · 2023-03-15T18:21:38.000Z

Hi,

I am running tekton-pipeline v.0.45.0 in a Kubernetes cluster v1.26.2 and the latest version of teton dashboard.
I installed all tekton components using tekton operator.

When I check the logs using kubectl (kubectl logs -f <pod> -c <step-container>) it's working fine.

I am accessing the dashboard using an ingress rule. I have installed metallb as LB.

Also the cluster is behind a corporate proxy.

BR

Answer 3 · 2023-03-15T18:29:41.000Z

If you check the browser's developer tools you should see a request to GET <dashboard_base_url>/api/v1/namespaces/<namespace>/pods/<taskrun_pod>/log?container=<step_container>&follow=true

Is this taking a long time to return a response? How big is the response?

Are there any errors in the browser console while it appears to be hanging?

Does this affect all steps / tasks / pipelines or just this specific one?

Answer 4 · 2023-03-15T18:48:18.000Z

i don't get a response the page is loading... and nothing in the browser console until it appears in the dashboard window.

Yes it affects all tasks.

Answer 5 · 2023-03-15T18:53:05.000Z

Can you check the same on the Dashboard page that shows the loading state? Check the network + console tabs on that page.

It's hard to tell without more information but this error suggests there may be an issue with your ingress / loadbalancer config, resulting in an invalid or incomplete response. You would need to check the documentation to see what's required to enable support for chunked encoding.

One possible workaround is to switch from log streaming to polling by editing the Dashboard deployment and setting --stream-logs=false, this will make a polling request to get the entire logs every 4 seconds instead of just streaming the new content as it's available.

Answer 6 · 2023-03-15T19:01:19.000Z

ok I will dig into this direction...

maybe related to the proxy (again)

Answer 7 · 2023-03-15T19:03:02.000Z

Updated my comment above with a possible workaround if it is indeed an issue with your proxy. Just beware that this mode is far less efficient as it may receive duplicate data multiple times until the step completes.

Answer 8 · 2023-03-15T19:47:48.000Z

Ok i'll try this workaround thanks.
What is exactly the endpoint called by the tekton dashboard to fetch logs in live ?

I might have to exclude something else from the proxy (no_proxy)...

Answer 9 · 2023-03-15T20:08:45.000Z

The request to stream logs goes to <dashboard_base_url>/api/v1/namespaces/<namespace>/pods/<taskrun_pod>/log?container=<step_container>&follow=true

The follow=true query param is the equivalent of the -f or --follow flag to kubectl logs.

The difference when polling is that the URL doesn't include the &follow=true.

Answer 10 · 2023-03-16T10:40:58.000Z

Actually after 110s a first chunk of the log is displayed by keeping --stream-logs=true.
I capture this line in ingress controller pod log :

2023/03/16 10:34:40 [error] 27318#27318: *31196386 upstream timed out (110: Operation timed out) while reading upstream, client: 10.36.0.0, server: XXXXXXX, request: "GET /api/v1/namespaces/default/pods/exec-tc-21.6-21664-pod/log?container=step-qftest&follow=true HTTP/1.1", upstream: "http://10.44.0.2:9097/api/v1/namespaces/default/pods/exec-tc-21.6-21664-pod/log?container=step-qftest&follow=true", host: "XXXXXXX", referrer: "http://XXXXXXX/"

Weird. Is it really the streaming which is not accepted ?! I will try to find what is wrong in my network conf but no clue for now.

Answer 11 · 2023-03-16T11:03:30.000Z

With streaming enabled a single request is made that remains open until all of the content for the selected step's logs has been received, i.e. about as long as your step runs. So for example if your step runs for 5 minutes, the request would last about 5 minutes. You may need to increase the timeout settings on your proxy to allow these requests.

Since this doesn't appear to be an issue with the Dashboard itself I'm going to close the issue but please do share if you find a resolution as it may be helpful for others. If you run into any more problems feel free to open a new issue or contact us on Slack.

Answer 12 · 2023-03-16T14:29:53.000Z

OK sure.
I paste some more details in case anyone has an idea to solve my issue :

My cluster has 1 master node and one worker node. I use weave net as CNI and metallb as load balancer.
All pods are running fine and communication seems ok.

in the error message pasted above :
2023/03/16 10:34:40 [error] 27318#27318: *31196386 upstream timed out (110: Operation timed out) while reading upstream, client: 10.36.0.0, server: XXXXXXX, request: "GET /api/v1/namespaces/default/pods/exec-tc-21.6-21664-pod/log?container=step-qftest&follow=true HTTP/1.1", upstream: "http://10.44.0.2:9097/api/v1/namespaces/default/pods/exec-tc-21.6-21664-pod/log?container=step-qftest&follow=true", host: "XXXXXXX", referrer: "http://XXXXXXX/"

10.36.0.0 : is the ip of weave pod on control plane node
10.44.0.2 : is the ip of tekton dashboard pod wich runs on worker node
XXXXXXX : is the FQDN of the control plane node + the name of the ingress host

I found a similar bug on the web kubernetes/ingress-nginx#3469 describing an hairpin mode issue.
Also when I try to curl the same url from a "curl test pod" running on worker node i get an immediate response with the log content.
When I remove the ingress rule and access the dashboard using a nodeport service : it's not working neither.
I have no Network Policy installed in the cluster.

If anyone has a hint for me please leave a comment...
Cheers