ThinkParQ/beegfs-csi-driver

metrics endpoint will not be started because `metrics-address` was not specified

klis opened this issue · 4 comments

klis commented

Hi,

when csi-provisioner container inside csi-beegfs-controller-0 pod it logs a warning with the message:

W0617 12:47:35.586467       1 metrics.go:142] metrics endpoint will not be started because `metrics-address` was not specified.

Full log:

I0617 12:47:31.644718       1 csi-provisioner.go:121] Version: v2.0.2
I0617 12:47:31.644795       1 csi-provisioner.go:135] Building kube configs for running in cluster...
I0617 12:47:31.651298       1 connection.go:153] Connecting to unix:///csi/csi.sock
I0617 12:47:35.580545       1 common.go:111] Probing CSI driver for readiness
I0617 12:47:35.580566       1 connection.go:182] GRPC call: /csi.v1.Identity/Probe
I0617 12:47:35.580570       1 connection.go:183] GRPC request: {}
I0617 12:47:35.585663       1 connection.go:185] GRPC response: {}
I0617 12:47:35.585730       1 connection.go:186] GRPC error: <nil>
I0617 12:47:35.585742       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0617 12:47:35.585745       1 connection.go:183] GRPC request: {}
I0617 12:47:35.586397       1 connection.go:185] GRPC response: {"name":"beegfs.csi.netapp.com","vendor_version":"v1.1.0-0-gc65b537"}
I0617 12:47:35.586447       1 connection.go:186] GRPC error: <nil>
I0617 12:47:35.586458       1 csi-provisioner.go:182] Detected CSI driver beegfs.csi.netapp.com
W0617 12:47:35.586467       1 metrics.go:142] metrics endpoint will not be started because `metrics-address` was not specified.
I0617 12:47:35.586477       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0617 12:47:35.586481       1 connection.go:183] GRPC request: {}
I0617 12:47:35.587155       1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}}]}
I0617 12:47:35.587274       1 connection.go:186] GRPC error: <nil>
I0617 12:47:35.587284       1 connection.go:182] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0617 12:47:35.587288       1 connection.go:183] GRPC request: {}
I0617 12:47:35.587831       1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}}]}
I0617 12:47:35.588083       1 connection.go:186] GRPC error: <nil>
I0617 12:47:35.588233       1 csi-provisioner.go:210] CSI driver does not support PUBLISH_UNPUBLISH_VOLUME, not watching VolumeAttachments
I0617 12:47:35.588790       1 controller.go:735] Using saving PVs to API server in background
I0617 12:47:35.689094       1 volume_store.go:97] Starting save volume queue

Does this driver support Prometheus metrics? If does, how they can be enabled? And is there any way that the Kubernetes admin can track how much storage is in use?

Hello @klis, thanks for reaching out.

The log in question is generated by the external-provisioner sidecar container that runs alongside the beegfs-csi-driver container in a Kubernetes deployment (both inside the csi-beegfs-controller-0 pod you referenced). The csi-provisioner container acts as an intermediary between Kubernetes and the CSI controller service in a typical Kubernetes CSI deployment. While the beegfs-csi-driver container itself does not natively support Prometheus metrics, you should be able to scrape the external-provisioner container (inside csi-beegfs-controller-0) for a variety of interesting CSI metrics (including total count, error count, and call latency). This Kubernetes CSI issue mentions the need for documentation around HOW exactly to configure Prometheus to do that, but no such documentation has been produced (largely because Prometheus deployments vary by environment). Any solution likely requires adding either the --http-endpoint or --metrics-address (deprecated) argument to the csi-provisioner container in the deployment manifests.

As is the case for other "directory-within-a-file-system" drivers (e.g. NFS), it is difficult to directly correlate requested capacity in Kubernetes with BeeGFS consumption. Our BeeGFS quotas support makes it possible to limit consumption on a per-storage-class basis (assuming a particular BeeGFS file system is under a support contract and allowed to use enterprise features), but the aggregate capacity shown by something like "kubectl get pv" doesn't generally represent BeeGFS storage consumed.

klis commented

@ejweber thank you for the detailed explanation.
I will definitely try to scrape some metrics.

As for storage consumption, I will talk to my SysAdmins to check what can we do about it.

Thanks for the update, @klis. I have created a low-priority story in the NetApp system to investigate ways to make scraping easier out of the box (e.g. add "--metrics-address or --http-endpoint" to the default deployment manifests). I'm not sure yet how to improve the experience generically. If you do end up scraping metrics, please share whatever you can about your experience.

klis commented

Thanks for the update, @klis. I have created a low-priority story in the NetApp system to investigate ways to make scraping easier out of the box (e.g. add "--metrics-address or --http-endpoint" to the default deployment manifests). I'm not sure yet how to improve the experience generically. If you do end up scraping metrics, please share whatever you can about your experience.

Will do