Azure/kubernetes-volume-drivers

NVMe disk was not discovered on Standard_L8s_v2 VM SKU

sunkararp opened this issue · 21 comments

What happened:
I'm using Standard_L8s_v2 VM SKU

  • The 1X1.92 TM nvme disk was not discovered on
  • It only discovered 80 GB temp disk

What you expected to happen:
Both 80 GB temp disk and 1.92 NVMe disk be discovered

How to reproduce it:
Create a K8s Cluster with vm scale set with Standard_L8s_v2 VM SKU

Anything else we need to know?:

Capture

Environment:

  • Kubernetes version (use kubectl version): v1.20.7
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

have you used https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/local/local-pv-provisioner-nvmedisk.yaml? it would find /dev/nvm* disks on agent node.

@andyzhangx thanks for the reply.

@sunkararp could you ssh to the agent node, and check whether there is /dev/nvm* device? if there is, follow guide here to get logs: https://github.com/Azure/kubernetes-volume-drivers/tree/master/local#troubleshooting

and did you add any taint to prevent that daemonset run on your node pool?

Attached is the screen capture of command 'kubectl logs local-volume-provisioner-5rzd7 -n kube-system'
Capture

so there is no /dev/nvm* disk under that node?

and did you add any taint to prevent that daemonset run on your node pool?

No

then do you know what's the device name of that nvme disk in that node pool? It should be /dev/nvme* device name for all nvme disks. Is it reproable?

Just noticed...

  • we were using mcr.microsoft.com/k8s/local-volume-provisioner:v2.3.5-alpha of local-volume-provisioner
  • New script is using mcr.microsoft.com/k8s/local-volume-provisioner:v2.4.0
  • Does this matter?

Just noticed...

  • we were using mcr.microsoft.com/k8s/local-volume-provisioner:v2.3.5-alpha of local-volume-provisioner
  • New script is using mcr.microsoft.com/k8s/local-volume-provisioner:v2.4.0
  • Does this matter?

@sunkararp these 2 versions almost has no difference. I think you need to find out what is the nvme device on that node, it's strange.

@sunkararp these 2 versions almost has no difference. I think you need to find out what is the nvme device on that node, it's strange.

How do I do that? I cannot connect to the node due to compliance.

I can run kubectl exec -it POD however.

BTW does app container deployed on this VM need to do anything for nvme disks to be visible?

try kubectl exec -it local-volume-provisioner-xxxxx -n kube-system -- ls /dev/

i see nvme disk is there, there is conflict between https://github.com/Azure/kubernetes-volume-drivers/blob/master/local/local-pv-provisioner-tempdisk.yaml and local-pv-provisioner-nvmedisk.yaml, pls only use one yaml config.

Awesome!
Thanks for your help

Sorry to reopen this issue.

  • I changed access mode to ReadWriteMany & it fails to BIND (I recreated the VM)
  • Below is the screen capture of the logs
    Capture

Therefore, is ReadWriteMany supported?

@sunkararp no, local disk only supports ReadWriteOnce

thanks, for quick reply

@sunkararp would you be willing to have a chat with the PG team on your use case for this?

@sunkararp would you be willing to have a chat with the PG team on your use case for this?

Sure.

BTW, we already in contact with MSFT on another issue relating to NVMe disk. TrackingID#2207130010002328