NetApp/trident

ontap-san with nvme breaks when /proc/mounts is excessive

magicite opened this issue · 0 comments

Describe the bug
Pods requesting NVMeoF-backed PVCs fail to have the storage attached to their nodes on nodes where the /proc/mounts content is lengthy. When this occurs, the nvme command emits something to stderr. I believe Trident is erroneously capturing the stderr message and failing.

cn1001:~ # nvme list
libhugetlbfs: ERROR: Line too long when parsing mounts
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            81MrpNW3EewqAAAAAAAB NetApp ONTAP Controller                  1          53.69  GB /  53.69  GB      4 KiB +  0 B   FFFFFFFF

On the node itself, I can manually (i.e., without trident) attach to the storage over NVMeoF, format the block device, etc.

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Trident version: 24.02.0
  • Container runtime: containerd://1.7.11-k3s2
  • Kubernetes version: v1.27.10+rke2r1
  • Kubernetes orchestrator: Harvester v1.3.0
  • OS: SLE micro 5.4
  • NetApp backend types: ONTAP A150

To Reproduce

  1. Prepare a k8s cluster with connectivity to a NetApp with NVMeoF support
  2. Configure the cluster with trident and configure a backend with ontap-san sanType nvme
  3. Create a mount entry such that it shows up in /proc/mounts with a line size greater than 2048, which is enough to cause the libhugetlbfs stderr (reference)
  4. Create a storage class that will target the previously created backend
  5. Create a PVC referencing the storage class
  6. Create a pod referencing the PVC

Expected behavior
The PVC should be dynamically provisioned, the pod should be scheduled to a node, the storage should attach to the node, and the pod should run with access to the storage.

Additional context
Logs attached as gathered from using tridentctl logs -n trident --node cn1003 --archive --sidecars
support-2024-03-28T10-31-01-CDT.zip