[prometheus-vmware-rules] Create an alert for all HBA paths are down

Question

[prometheus-vmware-rules] Create an alert for all HBA paths are down

Closed this issue a year ago · 0 comments

The critical symptom/condition in VMware Aria Operations occurs as "All HBA paths are down" on the host system --> triggered when one or more host adapters report that all paths are down and may impact the storage connectivity ==> the alert clears when at least one path is active

In the vCenter events the alert is structured as follows: Lost connectivity to storage device <...>, Path <...> is down

Use and leverage the AlertCollector from the vrops-exporter --> query the resource kind at the inventory's internal API --> to generate and build the Prometheus metric

Install/upgrade to the latest version of the vRealize Operations Management Pack for Storage Devices on all vROps instances

Define an VMware alert based or out of the scraped metric
Fetch out the following attributes/field values:
- VC with region and AZ
- Cluster name
- Affected DS('s)
- FQDN of event
- Host/node name

Test the alerting in the Concourse CI pipeline
- Roll out/deploy changes to the region(s)

Simulate an APD alert by disabling the network cards: --> (iSCSI software adapter/initiator, physical NIC 1 and 4) in the UCS cluster manager
Test and validate