Intermittent access issues with NFS Volumes
erkerb4 opened this issue · 0 comments
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior:
I've deployed rook-nfs using the quick-start guide, then followed create and initialize nfs server section to establish two nfs-servers. One NFS-Server was backed by HDD, and the other NFS-Server is backed by a SSD storage. The operator built the NFS servers succesfully.
Next thing I did was to create a deployment, and create PVC which used the SC for the NFS Server. When the pod first started, it created the PV fine, and binded correctly in the pod. Everything worked expected for a little white (like a week maybe?) Then all of a sudden, the pods were unable to access the volumes anymore. It would just hang, if you would open a shell and do 'ls' on the nfs volume.
When I restarted the pod that has the NFS volume, the pod failed to start. The pod never passes the "init" stage. Eventually, it will error out because it is unable to mount the volume that is backed by the NFS server.
I've attempted to restart all the nodes, try to schedule the pod on another node, but issue persists.
The only way I was able to get the pod to mount the volume again is to change the volume spec from PVC to NFS in the deployment:
volumes:
- name: gold-nfs-mount
nfs:
path: /gold-scratch/dir <--- Export
server: 172.30.17.118 <--- Service IP address of NFS Server
The weird thing was, this has happened one more time before, and the problem eventually went away. By itself.
Expected behavior:
Be able to continue to use persistentVolumeClaim for the volume instead of using nfs to mount volumes.
How to reproduce it (minimal and precise):
Create rook-nfs operator using the quick-start guide, then follow create and initialize nfs server section to establish nfs-servers.
To make it easier, this is my manifest:
Persistent Volume:
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: gold-scratch
labels:
type: ssd
spec:
storageClassName: local-storage
capacity:
storage: 200Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
hostPath:
path: "/mnt/scratch/gold"
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node1
- node2
PVC + NFS Server
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gold-scratch
namespace: rook-nfs
spec:
storageClassName: "local-storage"
accessModes:
- ReadWriteMany
selector:
matchLabels:
type: ssd
resources:
requests:
storage: 200Gi
---
apiVersion: nfs.rook.io/v1alpha1
kind: NFSServer
metadata:
name: gold-nfs
namespace: rook-nfs
spec:
replicas: 1
exports:
- name: gold-scratch
server:
accessMode: ReadWrite
squash: "none"
persistentVolumeClaim:
claimName: gold-scratch
annotations:
rook-nfs: gold-scratch
rook: nfs
StorageClass:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
labels:
rook-nfs: gold-scratch
type: ssd
name: gold-local
parameters:
exportName: gold-scratch
nfsServerName: gold-nfs
nfsServerNamespace: rook-nfs
provisioner: nfs.rook.io/gold-nfs-provisioner
reclaimPolicy: Delete
volumeBindingMode: Immediate
Verify:
$ kubectl get pods -n rook-nfs --selector=app=gold-nfs
NAME READY STATUS RESTARTS AGE
gold-nfs-0 2/2 Running 16 (4d20h ago) 5d13h
$ kubectl get sc gold-local
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gold-local nfs.rook.io/gold-nfs-provisioner Delete Immediate false 33d
Deploy an app, and use gold-local SC to for PVC. And Wait?
File(s) to submit:
The NFS Server does not show any errors.
Environment:
- OS (e.g. from /etc/os-release): Ubuntu 20.04.3 LTS
- Kernel (e.g.
uname -a
): Linux 5.11.0-43-generic - Cloud provider or hardware configuration: N/A On-Prem
- Rook version (use
rook version
inside of a Rook Pod): Rook NFS 1.7.3 - Storage backend version (e.g. for ceph do
ceph -v
): Rook NFS 1.7.3 - Kubernetes version (use
kubectl version
): v1.23.1 - Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): kubeadm