PureStorage-OpenConnect/pure-fa-openmetrics-exporter

purefa_alerts_open report errors with Purity 6.4.3

james-laing opened this issue · 7 comments

/metrics/array endpoint returns errors when multiple instances of the same error are discovered in pure_fa_open since upgrading to Purity 6.3.3 to 6.4.3. All other metrics endpoints work as expected and return results.

Versions

  • Purity //FA 6.3.4
  • OpenMetrics Exporter quay.io/purestorage/pure-fa-om-exporter:v1.0.5.hotfix1
An error has occurred while serving metrics:

17 error(s) occurred:
* collected metric "purefa_alerts_open" { label:<name:"component_name" value:"Service: active_directory_domain_catalog_ldap" > label:<name:"component_type" value:" IpAddress: 192.168.0.20" > label:<name:"severity" value:" Port: 389" > gauge:<value:1 > } was collected before with the same name and label values
* collected metric "purefa_alerts_open" { label:<name:"component_name" value:"Service: active_directory_domain_catalog_ldap" > label:<name:"component_type" value:" IpAddress: 192.168.0.20" > label:<name:"severity" value:" Port: 389" > gauge:<value:1 > } was collected before with the same name and label values
<truncated - message repeats>

The impact of this is that due to purefa_info failing to collect from /metrics/array any observability dashboards are unable to correlate data and therefore don't work.

Example of RESTAPI json output:

    {
      "description": "(directory_service:Service: active_directory_domain_catalog_ldap, IpAddress: 192.168.0.1, Port: 389, Filter: (&(uidNumber=1000)(objectClass=user))): Directory service lookup failed. Expected: , Actual: ",
      "created": 1678460265500,
      "state": "open",
      "component_type": "directory_service",
      "name": "10999254",
      "id": "1cb99f01d2754294a1d7eb3b6e61abdf",
      "code": 231,
      "category": "array",
      "severity": "info",
      "flagged": false,
      "updated": 1678460265500,
      "closed": null,
      "notified": null,
      "component_name": "Service: active_directory_domain_catalog_ldap, IpAddress: 192.168.0.1, Port: 389, Filter: (&(uidNumber=1000)(objectClass=user))",
      "expected": "",
      "actual": "",
      "issue": "Directory service lookup failed.",
      "knowledge_base_url": "https://support.purestorage.com/?cid=Alert_0231",
      "summary": "(directory_service:Service: active_directory_domain_catalog_ldap, IpAddress: 192.168.0.1, Port: 389, Filter: (&(uidNumber=1000)(objectClass=user))): Directory service lookup failed."
    }

Well that seems pretty major - have you confimred that this actually exists and we haven't hit a race condition where it got deleted halfway through the exporters run?
Otherwise we need to understand what has changed in the API response and check with the API team how they have broken backwards compatability.

It is only related to this specific error.
directory_service:Service: active_directory_domain_catalog_ldap
Now that the errors no longer exist the OME is successfully polling again.

so you changed something on the FB?

This is a FlashArray with File Services enabled. Now that the alerts are no longer present, the collector is working. Hopefully Eugenio can confirm if there is an exception to this particular error message.
This line was collected before with the same name and label values suggests it's not happy with the error message repeating.

@genegr can you confirm?

Confirmed it is a bug in the exporter. It is not specific to 6.4.3 but to the way the metric is extracted. Easy to fix, anyway.

genegr commented

Fixed in the code of PR #73