[prometheus-vmware-rules] Create an alert for all HBA paths are down
Closed this issue · 0 comments
kevin-fischer commented
The critical symptom/condition in VMware Aria Operations occurs as "All HBA paths are down" on the host system --> triggered when one or more host adapters report that all paths are down and may impact the storage connectivity ==> the alert clears when at least one path is active
In the vCenter events the alert is structured as follows: Lost connectivity to storage device <...>, Path <...> is down
Use and leverage the AlertCollector from the vrops-exporter --> query the resource kind at the inventory's internal API --> to generate and build the Prometheus metric
- Install/upgrade to the latest version of the vRealize Operations Management Pack for Storage Devices on all vROps instances
- Define an VMware alert based or out of the scraped metric
- Fetch out the following attributes/field values:
- VC with region and AZ
- Cluster name
- Affected DS('s)
- FQDN of event
- Host/node name
- Test the alerting in the Concourse CI pipeline
- Roll out/deploy changes to the region(s)
- Simulate an APD alert by disabling the network cards: --> (iSCSI software adapter/initiator, physical NIC 1 and 4) in the UCS cluster manager
- Test and validate