external-provisioner yields whole stack trace when it loses connection to the CSI driver
ialidzhikov opened this issue · 9 comments
/sig storage
/kind bug
What happened:
external-provisioner yields whole stack trace when it loses connection to the CSI driver.
What you expected to happen:
external-provisioner to only log connection lost and to exit (without logging thousands of lines of stack trace that is not useful for anyone).
How to reproduce it:
{"log":"2022-05-01T18:20:45.305918834Z stderr F E0501 18:20:45.305666 1 connection.go:131] Lost connection to unix:///var/lib/csi/sockets/pluginproxy/csi.sock."}
{"log":"2022-05-01T18:20:45.306436522Z stderr F F0501 18:20:45.306382 1 connection.go:87] Lost connection to CSI driver, exiting"}
{"log":"2022-05-01T18:20:46.106055473Z stderr F goroutine 82 [running]:"}
{"log":"2022-05-01T18:20:46.106122391Z stderr F k8s.io/klog/v2.stacks(0xc00000e001, 0xc000bec1e0, 0x57, 0x1cd)"}
{"log":"2022-05-01T18:20:46.106166734Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9"}
{"log":"2022-05-01T18:20:46.106178188Z stderr F k8s.io/klog/v2.(*loggingT).output(0x2606640, 0xc000000003, 0x0, 0x0, 0xc000472f50, 0x2554978, 0xd, 0x57, 0x0)"}
{"log":"2022-05-01T18:20:46.106196151Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:975 +0x19b"}
{"log":"2022-05-01T18:20:46.106206488Z stderr F k8s.io/klog/v2.(*loggingT).printf(0x2606640, 0x3, 0x0, 0x0, 0x0, 0x0, 0x19a688e, 0x26, 0x0, 0x0, ...)"}
{"log":"2022-05-01T18:20:46.106231016Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:750 +0x191"}
{"log":"2022-05-01T18:20:46.1062379Z stderr F k8s.io/klog/v2.Fatalf(...)"}
{"log":"2022-05-01T18:20:46.10624548Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:1514"}
{"log":"2022-05-01T18:20:46.10626969Z stderr F github.com/kubernetes-csi/csi-lib-utils/connection.ExitOnConnectionLoss.func1(0x2606640)"}
{"log":"2022-05-01T18:20:46.106276233Z stderr F \t/workspace/vendor/github.com/kubernetes-csi/csi-lib-utils/connection/connection.go:87 +0x1d4"}
{"log":"2022-05-01T18:20:46.106282511Z stderr F github.com/kubernetes-csi/csi-lib-utils/connection.connect.func1(0xc00037cba0, 0x29, 0x4a8174a0a, 0x4a8174a0a, 0xc000027101, 0x10, 0xc000027108)"}
{"log":"2022-05-01T18:20:46.106289479Z stderr F \t/workspace/vendor/github.com/kubernetes-csi/csi-lib-utils/connection/connection.go:134 +0x2aa"}
{"log":"2022-05-01T18:20:46.106298558Z stderr F google.golang.org/grpc.WithDialer.func1(0x1ba5ce0, 0xc000fc2ea0, 0xc00037cba0, 0x29, 0x10, 0x17d6240, 0x990163506e6a24f4, 0x2637dc0)"}
{"log":"2022-05-01T18:20:46.106305463Z stderr F \t/workspace/vendor/google.golang.org/grpc/dialoptions.go:398 +0x8e"}
{"log":"2022-05-01T18:20:46.106314589Z stderr F google.golang.org/grpc/internal/transport.dial(0x1ba5ce0, 0xc000fc2ea0, 0xc0000320a0, 0xc00037cba0, 0x29, 0x1981038, 0x9, 0xc00068c0a8, 0x0, 0x0, ...)"}
{"log":"2022-05-01T18:20:46.106320822Z stderr F \t/workspace/vendor/google.golang.org/grpc/internal/transport/http2_client.go:143 +0x2dd"}
{"log":"2022-05-01T18:20:46.106326854Z stderr F google.golang.org/grpc/internal/transport.newHTTP2Client(0x1ba5ce0, 0xc000fc2ea0, 0x1ba5c60, 0xc00099df80, 0xc00037cba0, 0x29, 0x1981038, 0x9, 0xc00068c0a8, 0x0, ...)"}
<omitted>
Anything else we need to know?:
The issue is similar to kubernetes/kubernetes#107665.
Environment:
- external-provisioner: v2.1.1
- Kubernetes version (use
kubectl version
): 1.21.10
Care to submit a PR?
Simply update to the latest klog and then use klog.ErrorS + klog.FlushAndExit
instead of klog.Fatal
.
The same change needs to go into all sidecars which use klog.Fatal
.
Simply update to the latest klog and then use
klog.ErrorS + klog.FlushAndExit
instead ofklog.Fatal
.
Thanks! klog dependency in the HEAD is already at the latest tag (v2.60.1
). Hence, I guess the only thing I have to do is to adapt the klog.Fatal
usages.
Ah, sorry. The corresponding klog invocation rather comes from https://github.com/kubernetes-csi/csi-lib-utils:
Looks like this is already fixed in https://github.com/kubernetes-csi/csi-lib-utils with kubernetes-csi/csi-lib-utils#81.
It should be rather fixed for external-provider >= v3.0.0. This is the commit that updated to github.com/kubernetes-csi/csi-lib-utils
>= v0.10.0 -251509c.
Does it make sense to update external-provider release-2.1 and release-2.2 branches by updating github.com/kubernetes-csi/csi-lib-utils
from v0.9.0 and v0.9.1 to v0.10.0 (or a potential v0.9.2)?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/close
as the issue is fixed in external-provider >= v3.0.0
@ialidzhikov: Closing this issue.
In response to this:
/close
as the issue is fixed in external-provider >= v3.0.0
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.