Enhance NFS Mount Efficiency with Stage/Unstage Volume Capability
woehrl01 opened this issue · 10 comments
Is your feature request related to a problem?/Why is this needed
Describe the solution you'd like in detail
I would like to propose an enhancement that focuses on optimizing NFS mount operations. This feature aims to improve resource utilization and reduce startup times for pods accessing NFS servers. A similar mounting behaviour exists on the ebs csidriver or juicefs csidriver.
The core idea is to introduce an option that leverages the stage and unstage volume capabilities of the CSI driver. The proposed changes include:
-
Single NFS Server Mount: Mount the NFS server only once for each unique combination of server name, export name, and mount options.
-
Bind Mounts for Pods: Implement actual bind mounts for each pod accessing the NFS server. This approach should also support subpaths for each pod.
-
Mount Management: Ensure that the mount operation occurs once per unique combination mentioned above, preventing redundant mounts (or simply the volumeid of the pv).
This enhancement brings several key benefits:
-
Reduced Mount Operations: By mounting the NFS server less frequently, we can significantly reduce the number of mount operations that the NFS server has to handle.
-
Improved Cache Utilization: With fewer mounts, cache usage becomes more efficient, enhancing overall system performance.
-
Faster Startup Times for Pods: Pods accessing the NFS server will experience quicker startup times, leading to more efficient deployments and scaling operations.
Describe alternatives you've considered
An alternative could be using a daemonset which mounts the nfs servers to the host, which then are bind mounted via hostpath into the pod. The problem is here that it hides the fact in the pod that a nfs is used and could be less reliable.
Additional context
@woehrl01 thanks for raising this issue. I agree that add NodeStageVolume
support would reduce the nfs mount since it's per pv mount per node, while it would raise the other issue, e.g. NodeStageVolume
does not respect fsGroupChangePolicy
(SecurityConext support), NodePublishVolume
does, you could find more details here: kubernetes-sigs/azurefile-csi-driver#1224 (comment)
There is a performance and k8s compliance tradeoff between whether supports NodeStageVolume
or not, I am not sure what's the right way for such requirement.
cc @jsafrane @gnufied any ideas whether we need to implement NodeStageVolume
or not?
@andyzhangx thank you for mentioning this problem, I wasn't aware of this discussions, yet.
I'm curious if this actually is an issue in that case. If the stage volume is only creating the initial mount for the export root of the nfs server. The publish volume step can still set the fsgroup on the actual (sub) mount point which is created by the bind mount.
As I'm not an expert in fsgroup, what am I missing in that case?
@andyzhangx thank you for mentioning this problem, I wasn't aware of this discussions, yet.
I'm curious if this actually is an issue in that case. If the stage volume is only creating the initial mount for the export root of the nfs server. The publish volume step can still set the fsgroup on the actual (sub) mount point which is created by the bind mount.
As I'm not an expert in fsgroup, what am I missing in that case?
@woehrl01 support you have a nfs mount with gid=x, and then set gid=y in bind mount path, then the original nfs mount would also have gid=y
@andyzhangx I see, thank you. That's an interesting behaviour I wasn't aware about.
I found https://bindfs.org/ which could be a possible solution for that bind mount behaviour.
It still would be great to have this option as a featureflag, if this behaviour of fsgroup is documented.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale