Logs are only appearing once pod is completed
lucienbanholzer opened this issue · 12 comments
Hi @lucienbanholzer, can you share more information about your environment and the Pipeline / Task you're seeing this behaviour with? The expected behaviour is that logs show up as soon as they're available, so if a step is outputting logs while it's running they should be displayed in the Dashboard within a few seconds normally.
- What Tekton Pipelines and Dashboard versions are you using? How did you install them?
- Does this happen every time? Only with specific Task / Pipeline?
- When it happens do you observe the same behaviour calling
kubectl logs -f <pod> -c <step-container>
directly? (it's essentially calling the same API under the hood) - How are you accessing the Dashboard? e.g. via an Ingress,
kubectl port-forward
,kubectl proxy
, etc.
Hi,
I am running tekton-pipeline v.0.45.0 in a Kubernetes cluster v1.26.2 and the latest version of teton dashboard.
I installed all tekton components using tekton operator.
When I check the logs using kubectl (kubectl logs -f <pod> -c <step-container>
) it's working fine.
I am accessing the dashboard using an ingress rule. I have installed metallb as LB.
Also the cluster is behind a corporate proxy.
BR
If you check the browser's developer tools you should see a request to GET <dashboard_base_url>/api/v1/namespaces/<namespace>/pods/<taskrun_pod>/log?container=<step_container>&follow=true
Is this taking a long time to return a response? How big is the response?
Are there any errors in the browser console while it appears to be hanging?
Does this affect all steps / tasks / pipelines or just this specific one?
Can you check the same on the Dashboard page that shows the loading state? Check the network + console tabs on that page.
It's hard to tell without more information but this error suggests there may be an issue with your ingress / loadbalancer config, resulting in an invalid or incomplete response. You would need to check the documentation to see what's required to enable support for chunked encoding.
One possible workaround is to switch from log streaming to polling by editing the Dashboard deployment and setting --stream-logs=false
, this will make a polling request to get the entire logs every 4 seconds instead of just streaming the new content as it's available.
Updated my comment above with a possible workaround if it is indeed an issue with your proxy. Just beware that this mode is far less efficient as it may receive duplicate data multiple times until the step completes.
Ok i'll try this workaround thanks.
What is exactly the endpoint called by the tekton dashboard to fetch logs in live ?
I might have to exclude something else from the proxy (no_proxy)...
The request to stream logs goes to <dashboard_base_url>/api/v1/namespaces/<namespace>/pods/<taskrun_pod>/log?container=<step_container>&follow=true
The follow=true
query param is the equivalent of the -f
or --follow
flag to kubectl logs
.
The difference when polling is that the URL doesn't include the &follow=true
.
Actually after 110s a first chunk of the log is displayed by keeping --stream-logs=true.
I capture this line in ingress controller pod log :
2023/03/16 10:34:40 [error] 27318#27318: *31196386 upstream timed out (110: Operation timed out) while reading upstream, client: 10.36.0.0, server: XXXXXXX, request: "GET /api/v1/namespaces/default/pods/exec-tc-21.6-21664-pod/log?container=step-qftest&follow=true HTTP/1.1", upstream: "http://10.44.0.2:9097/api/v1/namespaces/default/pods/exec-tc-21.6-21664-pod/log?container=step-qftest&follow=true", host: "XXXXXXX", referrer: "http://XXXXXXX/"
Weird. Is it really the streaming which is not accepted ?! I will try to find what is wrong in my network conf but no clue for now.
With streaming enabled a single request is made that remains open until all of the content for the selected step's logs has been received, i.e. about as long as your step runs. So for example if your step runs for 5 minutes, the request would last about 5 minutes. You may need to increase the timeout settings on your proxy to allow these requests.
Since this doesn't appear to be an issue with the Dashboard itself I'm going to close the issue but please do share if you find a resolution as it may be helpful for others. If you run into any more problems feel free to open a new issue or contact us on Slack.
OK sure.
I paste some more details in case anyone has an idea to solve my issue :
My cluster has 1 master node and one worker node. I use weave net as CNI and metallb as load balancer.
All pods are running fine and communication seems ok.
in the error message pasted above :
2023/03/16 10:34:40 [error] 27318#27318: *31196386 upstream timed out (110: Operation timed out) while reading upstream, client: 10.36.0.0, server: XXXXXXX, request: "GET /api/v1/namespaces/default/pods/exec-tc-21.6-21664-pod/log?container=step-qftest&follow=true HTTP/1.1", upstream: "http://10.44.0.2:9097/api/v1/namespaces/default/pods/exec-tc-21.6-21664-pod/log?container=step-qftest&follow=true", host: "XXXXXXX", referrer: "http://XXXXXXX/"
10.36.0.0 : is the ip of weave pod on control plane node
10.44.0.2 : is the ip of tekton dashboard pod wich runs on worker node
XXXXXXX : is the FQDN of the control plane node + the name of the ingress host
-
I found a similar bug on the web kubernetes/ingress-nginx#3469 describing an hairpin mode issue.
-
Also when I try to curl the same url from a "curl test pod" running on worker node i get an immediate response with the log content.
-
When I remove the ingress rule and access the dashboard using a nodeport service : it's not working neither.
-
I have no Network Policy installed in the cluster.
If anyone has a hint for me please leave a comment...
Cheers