[nfs] question: surviving pod restarts, nfs3 vs fsid_device=false
naseemkullah opened this issue ยท 19 comments
Hi @wongma7, thanks for this great project.
Going to NFS3 as per @kvaps' #1241
and making fsid_device settable to false as per @thirdeyenick's #1212 are both to solve for issues caused by pod restarts if I understand correctly.
Is the above statement correct?
If so, could you please describe the difference between these two approaches? Which is recommended?
@kvaps @thirdeyenick please chime in with your thoughts as well if you have a moment, thanks!
@naseemkullah there is also one unpleasant bug caused by IPVS graceful termination in kube-proxy:
kubernetes/kubernetes#84322 and formaly kubernetes/kubernetes#81775
Until graceful termination will not be solved the nfs-provisioner pod restarts will cause hung clients.
As workaround I switched from kube-proxy to kube-router which having configurable behavior for the graceful termination, and it is switched off by default, which is working fine for me.
As about Stale file handles this error might happen if you running nfs-server on two different filesystems with the same file structure, because by design NFS uses parent filesystem inodes and provides them "as is" to its clients. If I understood it correct the fsid_device option allows NFS-ganesha server to remember the inodes and always provide the same ones to the clients even if filesystem were changed to another one. Please correct me if I'm wrong.
The Export ID and FSID are used to uniquely identify handles to Ganesha.
In NFS, the client gets an opaque handle to things (files,
directories, block-devices, etc.) and use these opaque handles to
reference those objects on the server. Ganesha breaks this up into 2
parts: The global part specifies a version, an export ID, and a length;
and the per-FSAL part is opaque, and controlled by the FSAL that owns
the export.
VFS stores the FSID of the filesystem owning the object, and then the
actual kernel handle (as passed to the *_at (open_by_handle_at(2) for
example). The reason for the FSID is that Linux handles are only
guaranteed to be unique within a single filesystem.
If an export is removed, and another one is added, but it has the same
system major/minor (which is the primary FSID on Linux), the handles
that the client previously had open on the old export will try to be
used on the new export, since, as far as Ganesha knows, they're valid
for that export.
In general, an export ID/FSID combo should never be re-used for the
lifetime of a Ganesha server instance. This isn't a problem for
filesystems based on block devices, since the FSID is based on the block
device, and so will be unique, but can be a problem for FUSE, which
generates it's FSID.
One way around this would be to create the new FUSE FS before you take
down the old one. That way it will get a new FSID. Or you can just
script configuration in Ganesha with unique FSIDs. Ganesha has the
ability to load config snippits (such as exports) from files with the
%include directive. You can try using that with generated exports.
Daniel
@naseemkullah there is also one unpleasant bug caused by IPVS graceful termination in kube-proxy:
kubernetes/kubernetes#84322 and formaly kubernetes/kubernetes#81775Until graceful termination will not be solved the nfs-provisioner pod restarts will cause hung clients.
As workaround I switched from kube-proxy to kube-router which having configurable behavior for the graceful termination, and it is switched off by default, which is working fine for me.
Interesting do you have any documentation for the latter? With latest 2.3.0 helm chart, but sometimes the clients are indeed still hanging.
@cedricve you can read this kubernetes/kubernetes#84322 (comment) and try to remove realserver manually on the client's node after hanging
@cedricve you can read this kubernetes/kubernetes#84322 (comment) and try to remove realserver manually on the client's node after hanging
thanks but is a manual fix, after noticing a client is hanging?
It is, there is nothing else for now, as workaround you can switch kube-proxy mode from IPVS to iptables or switch using kube-router instead of kube-proxy for the service proxy
thanks great thoughts, this just make me think and scares me for everyone in the kubernetes world.. What other best practices could we/I follow for sharing volumes between deployments?
thanks great thoughts, this just make me think and scares me for everyone in the kubernetes world.. What other best practices could we/I follow for sharing volumes between deployments?
Well best practices is using object storage instead, eg. S3, but this isn't always possible, since we have a ton of legacy, we need POSIX and working nfs-server for that
I was about to create a central media server, on a seperate VM (POST/GET of files), because I'm just so desperate. Currently I'm still wondering when my clients might break. siigh
@cedricve, if you have the opportunity to control behavior of your application, try using minio, it is easiest thing for organize reliable object storage
thanks also found rook.io, not sure if they have any fixes for the stale handle issues? Did you noticed this one already @kvaps ?
Is there a benefit in setting fsid_device to false when using NFS3? @kvaps
I'm not sure, but I guess that so.
This option explains nfs-server how to represent the data, but not affects transfer method.
Would you think it a good idea to set fsid_device to false as a default in the helm chart?
I'm fine with this change, the other question what is potential problems it can bring.
I think @wongma7 is more competent to answer this question.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen
.
Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.