NetApp/trident

NVMe/tcp doesn't work with long k8s node names

magicite opened this issue · 3 comments

Describe the bug
When using nvme/tcp in k8s environments with long node names, pods cannot attach to the storage.

Event from pod that won't initialize

   Warning  FailedAttachVolume  16m                 attachdetach-controller  AttachVolume.Attach failed for volume "pvc-48bc8cb8-941c-4b39-a1e7-5082c4d74474" : rpc erro │
│ r: code = Unknown desc = [GET /protocols/nvme/subsystems][400] nvme_subsystem_collection_get default  &{Error:0xc001b405d0}

Corresponding entry from security audit log show:

Tue Jun 18 17:03:16 2024  nas3502-04   [kern_audit:info:3702] 8503eb0000008fb4 :: nas3502-A800:http :: 172.16.251.224:60978 :: site-que1:vsadmin :: GET /api/protocols/nvme/subsystems?fields=%2A%2A&name=site-que1-wrk-62e68963-v6vx4-6cf6b4ac-ea2a-4675-9eb1-740bc8f6ecf0&svm.uuid=faa0ae84-2cee-11ef-ac41-d039ea9b7294 :: Error: "site-que1-wrk-62e68963-v6vx4-6cf6b4ac-ea2a-4675-9eb1-740bc8f6ecf0" is an invalid value for field "name" (<text (size 1..64)>)

Environment

  • Trident 24.02.0
  • Ubuntu 22.04.3 LTS
  • k8s v1.26.13+rke2r1
  • AFF800 NetApp Release 9.14.1: Wed Jan 24 02:50:30 UTC 2024
  • Provisioned with rancher, which picked the node names

To Reproduce
Given the entry from the audit log, I think you need to have a k8s node with a long name.

Expected behavior
The volume should attach.

Extra info
I happened to have an older test environment set up, that was originally used with an older version of astra trident and ONTAP software, which also has long node names. Things worked in that environment, and have held steady since then. I just went to create a new pod, without updating the astra trident software, but with the AFF800 now running 9.14.1, and it fails identical to above. I would guess then that this is a regression introduced in or around ontap 9.14.1.

Hi @magicite,

When Trident implemented NVMe driver, the max length of subsystem that ONTAP used was 96 chars and that’s what Trident used. However, ONTAP has changed the max length of NVMe subsystem from 96 chars to 64 chars in 9.14.1.
This ONTAP change breaks the backward compatibility.

For the file system volumes, the NVMe subsystem is a combination of host-node-name and trident UUID .
This issue has already been identified & should be fixed in Trident 24.06

Meanwhile, 3 options:

  • Wait for the Trident fix in 24.06
  • Change the node name to something short so that the whole subsystem name is < 64 chars
  • downgrade on ONTAP 9.13