kubernetes-csi/external-provisioner

The provisioner exits after 30 minutes of idle.

jsafrane opened this issue · 11 comments

What happened:

Automatic gRPC bump to 1.59.0 introduced a new gRPC behavior that closes idle connections after 30 minutes of inactivity. After 30 minutes of no provisioning / deletion, the connection to a CSI driver is silently closed. At the next provisioning / deletion, the provisioner realizes the connection is closed and exits with Lost connection to CSI driver, exiting. A new provisioner starts immediately, but it must wait for leader election to expire, which adds quite a long delay to volume provisioning (and our downstream e2e tests time out).

What you expected to happen:

The gRPC connection should not close because of inactivity.

How to reproduce it:
On a very quiet cluster (no provisioning/deletion), wait for 30 minutes after external-provisioner start and create a new PVC that should be dynamically provisioned.

I filed kubernetes-csi/csi-lib-utils#153 to disable autoclose.

/assign

/reopen
we still need to vendor new csi-lib-utils here

@jsafrane: Reopened this issue.

In response to this:

/reopen
we still need to vendor new csi-lib-utils here

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jsafrane What is the status of this issue ?

Maybe there is another issue around this: when the socket times out, the container fails without releasing the lease. Is this intended? Once restarted, the container doesn't recover the lease, we have to wait for the lease timeout (300s with vsphere-csi).

I found that external-provisioner uses a random identity for the lease:

identity := strconv.FormatInt(timeStamp, 10) + "-" + strconv.Itoa(rand.Intn(10000)) + "-" + provisionerName
if *enableNodeDeployment {
identity = identity + "-" + node
}

This is not the case for external-attacher for example:

https://github.com/kubernetes-csi/external-attacher/blob/4e13fc2eabc320c779b574bf35bb79dd00feb2e2/cmd/csi-attacher/main.go#L281-L283

NB: default is hostname, i.e. pod name:

https://github.com/kubernetes-csi/csi-lib-utils/blob/f82f9de5b8aeb3c3b236d7f58fc5eeab34438078/leaderelection/leader_election.go#L198-L200

@sathieu Can you open a different issue? The original issue should have been fixed in the latest patch releases.

This was fixed in master branch by #1135, sorry I forgot to close it.
/close

@jsafrane: Closing this issue.

In response to this:

This was fixed in master branch by #1135, sorry I forgot to close it.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sathieu Can you open a different issue? The original issue should have been fixed in the latest patch releases.

Done #1147