kubernetes-csi/external-provisioner

external-provisioner yields whole stack trace when it loses connection to the CSI driver

ialidzhikov opened this issue · 9 comments

/sig storage
/kind bug

What happened:
external-provisioner yields whole stack trace when it loses connection to the CSI driver.

What you expected to happen:
external-provisioner to only log connection lost and to exit (without logging thousands of lines of stack trace that is not useful for anyone).

How to reproduce it:

{"log":"2022-05-01T18:20:45.305918834Z stderr F E0501 18:20:45.305666       1 connection.go:131] Lost connection to unix:///var/lib/csi/sockets/pluginproxy/csi.sock."}
{"log":"2022-05-01T18:20:45.306436522Z stderr F F0501 18:20:45.306382       1 connection.go:87] Lost connection to CSI driver, exiting"}
{"log":"2022-05-01T18:20:46.106055473Z stderr F goroutine 82 [running]:"}
{"log":"2022-05-01T18:20:46.106122391Z stderr F k8s.io/klog/v2.stacks(0xc00000e001, 0xc000bec1e0, 0x57, 0x1cd)"}
{"log":"2022-05-01T18:20:46.106166734Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9"}
{"log":"2022-05-01T18:20:46.106178188Z stderr F k8s.io/klog/v2.(*loggingT).output(0x2606640, 0xc000000003, 0x0, 0x0, 0xc000472f50, 0x2554978, 0xd, 0x57, 0x0)"}
{"log":"2022-05-01T18:20:46.106196151Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:975 +0x19b"}
{"log":"2022-05-01T18:20:46.106206488Z stderr F k8s.io/klog/v2.(*loggingT).printf(0x2606640, 0x3, 0x0, 0x0, 0x0, 0x0, 0x19a688e, 0x26, 0x0, 0x0, ...)"}
{"log":"2022-05-01T18:20:46.106231016Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:750 +0x191"}
{"log":"2022-05-01T18:20:46.1062379Z stderr F k8s.io/klog/v2.Fatalf(...)"}
{"log":"2022-05-01T18:20:46.10624548Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:1514"}
{"log":"2022-05-01T18:20:46.10626969Z stderr F github.com/kubernetes-csi/csi-lib-utils/connection.ExitOnConnectionLoss.func1(0x2606640)"}
{"log":"2022-05-01T18:20:46.106276233Z stderr F \t/workspace/vendor/github.com/kubernetes-csi/csi-lib-utils/connection/connection.go:87 +0x1d4"}
{"log":"2022-05-01T18:20:46.106282511Z stderr F github.com/kubernetes-csi/csi-lib-utils/connection.connect.func1(0xc00037cba0, 0x29, 0x4a8174a0a, 0x4a8174a0a, 0xc000027101, 0x10, 0xc000027108)"}
{"log":"2022-05-01T18:20:46.106289479Z stderr F \t/workspace/vendor/github.com/kubernetes-csi/csi-lib-utils/connection/connection.go:134 +0x2aa"}
{"log":"2022-05-01T18:20:46.106298558Z stderr F google.golang.org/grpc.WithDialer.func1(0x1ba5ce0, 0xc000fc2ea0, 0xc00037cba0, 0x29, 0x10, 0x17d6240, 0x990163506e6a24f4, 0x2637dc0)"}
{"log":"2022-05-01T18:20:46.106305463Z stderr F \t/workspace/vendor/google.golang.org/grpc/dialoptions.go:398 +0x8e"}
{"log":"2022-05-01T18:20:46.106314589Z stderr F google.golang.org/grpc/internal/transport.dial(0x1ba5ce0, 0xc000fc2ea0, 0xc0000320a0, 0xc00037cba0, 0x29, 0x1981038, 0x9, 0xc00068c0a8, 0x0, 0x0, ...)"}
{"log":"2022-05-01T18:20:46.106320822Z stderr F \t/workspace/vendor/google.golang.org/grpc/internal/transport/http2_client.go:143 +0x2dd"}
{"log":"2022-05-01T18:20:46.106326854Z stderr F google.golang.org/grpc/internal/transport.newHTTP2Client(0x1ba5ce0, 0xc000fc2ea0, 0x1ba5c60, 0xc00099df80, 0xc00037cba0, 0x29, 0x1981038, 0x9, 0xc00068c0a8, 0x0, ...)"}

<omitted>

Anything else we need to know?:
The issue is similar to kubernetes/kubernetes#107665.

Environment:

  • external-provisioner: v2.1.1
  • Kubernetes version (use kubectl version): 1.21.10
pohly commented

Care to submit a PR?

Simply update to the latest klog and then use klog.ErrorS + klog.FlushAndExit instead of klog.Fatal.

The same change needs to go into all sidecars which use klog.Fatal.

Simply update to the latest klog and then use klog.ErrorS + klog.FlushAndExit instead of klog.Fatal.

Thanks! klog dependency in the HEAD is already at the latest tag (v2.60.1). Hence, I guess the only thing I have to do is to adapt the klog.Fatal usages.

pohly commented

Then the dependency update in #710 should fix it.

It should be rather fixed for external-provider >= v3.0.0. This is the commit that updated to github.com/kubernetes-csi/csi-lib-utils >= v0.10.0 -251509c.


Does it make sense to update external-provider release-2.1 and release-2.2 branches by updating github.com/kubernetes-csi/csi-lib-utils from v0.9.0 and v0.9.1 to v0.10.0 (or a potential v0.9.2)?

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/close
as the issue is fixed in external-provider >= v3.0.0

@ialidzhikov: Closing this issue.

In response to this:

/close
as the issue is fixed in external-provider >= v3.0.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.