topine/ibm-spectrum-exporter

Error message from spectrum breaks storage_metrics

johnnnnnnnnnnnnnn opened this issue · 4 comments

Adding certain metrics IDs to the metrics_config.yaml seems to break the collection of metrics. (see attachd logs)

The metric ID in this specific case is 1029. I added it to my metrics_conf.yaml like this:

metrics:
  storage_systems:
    - ibm_spectrum_metric_id: 1029
      prometheus_name: storage_invalid_link_transmission_rate
      prometheus_help: The average number of times per second that an invalid transmission word was detected by the port while the link did not experience any signal or synchronization loss.

I have no problems accesseing the metric by itself:

wget --no-check-certificate --load-cookies cookies.txt https://spectrumcontrol:9569/srm/REST/api/v1/StorageSystems/9669/Performance/1029

returns:

[
   {
      "metricDetails": {
         "1029": {
            "description": "The average number of times per second that an invalid transmission word was detected by the port while the link did not experience any signal or synchronization loss.",
            "name": "Invalid Link Transmission Rate",
            "units": "cnt\/s"
         }
      }
   },
   {
      "current": [
         {
            "x": 1593676221744,
            "y": null
         },
         {
.. blanked out for readability
         {
            "x": 1593762621744,
            "y": null
         }
      ],
      "deviceId": 9669,
      "deviceName": "blanked-out-before-sharing-publically",
      "endTime": 1593762621744,
      "label": "Invalid Link Transmission Rate",
      "maxValue": 0.0,
      "metricId": 1029,
      "minValue": 0.0,
      "precision": 4,
      "resourceID": 9669,
      "startTime": 1593676221744,
      "units": "cnt\/s"
   }
]

metric collection still works for switches and pools. But all storage metrics are gone., the logs gives the following information:

2020-07-03T08:06:50.092Z        INFO    ibm-spectrum-exporter/main.go:122       Starting to collect the metrics.
2020-07-03T08:06:50.165Z        INFO    spectrumservice/client.go:427   Number of Switches retrieved: 24
2020-07-03T08:06:50.266Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3071 : 1 .
2020-07-03T08:06:50.312Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3068 : 1 .
2020-07-03T08:06:50.340Z        INFO    spectrumservice/client.go:449   Metrics received for switch 125761 : 0 .
2020-07-03T08:06:50.381Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3055 : 1 .
2020-07-03T08:06:50.430Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3032 : 1 .
2020-07-03T08:06:50.476Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3065 : 1 .
2020-07-03T08:06:50.524Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3010 : 1 .
2020-07-03T08:06:50.572Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3021 : 1 .
2020-07-03T08:06:50.617Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3043 : 1 .
2020-07-03T08:06:50.650Z        INFO    spectrumservice/client.go:513   Number of Pools retrieved: 2
2020-07-03T08:06:50.663Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3074 : 1 .
2020-07-03T08:06:50.717Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3051 : 1 .
2020-07-03T08:06:50.768Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3040 : 1 .
2020-07-03T08:06:50.821Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3061 : 1 .
2020-07-03T08:06:50.869Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3018 : 1 .
2020-07-03T08:06:50.915Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3029 : 1 .
2020-07-03T08:06:50.948Z        INFO    spectrumservice/client.go:449   Metrics received for switch 125757 : 0 .
2020-07-03T08:06:50.978Z        INFO    spectrumservice/client.go:449   Metrics received for switch 125754 : 0 .
2020-07-03T08:06:51.026Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3005 : 1 .
2020-07-03T08:06:51.070Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3025 : 1 .
2020-07-03T08:06:51.117Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3058 : 1 .
2020-07-03T08:06:51.162Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3036 : 1 .
2020-07-03T08:06:51.209Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3014 : 1 .
2020-07-03T08:06:51.255Z        INFO    spectrumservice/client.go:449   Metrics received for switch 3047 : 1 .
2020-07-03T08:06:51.282Z        INFO    spectrumservice/client.go:233   Number of Storage volumes retrieved: 1
2020-07-03T08:06:51.282Z        INFO    spectrumservice/client.go:237   Selecting storage system with regex : .*
2020-07-03T08:06:51.288Z        INFO    spectrumservice/client.go:449   Metrics received for switch 125764 : 0 .
2020-07-03T08:06:51.396Z        INFO    spectrumservice/client.go:269   Metrics received for storageID 9669 : 10 .
2020-07-03T08:06:53.451Z        INFO    spectrumservice/client.go:308   Volumes retrieved for Storage System 9669 : 515
2020-07-03T08:06:54.018Z        ERROR   spectrumservice/client.go:319   Error during volume metrics call.[{ " r e s u l t " : { " m s g I d " : " B P C U I 0 0 9 9 E " , " t e x t " : " T h e   s t o r a g e   r e s o u r c e   i s   n o t   a v a i l a b l e . " , " t i m e " : " J u l   3 ,   2 0 2 0 ,   1 0 : 0 6 : 5 4 " , " t y p e " : " E " } }]lient
github.com/topine/ibm-spectrum-exporter/spectrumservice.(*Client).collectVolumeMetrics
        /Users/apimenteldasilvatopi/github/ibm-spectrum-exporter/spectrumservice/client.go:319
github.com/topine/ibm-spectrum-exporter/spectrumservice.(*Client).CollectStorageMetrics
        /Users/apimenteldasilvatopi/github/ibm-spectrum-exporter/spectrumservice/client.go:197
github.com/topine/ibm-spectrum-exporter/spectrumservice.(*Client).CollectAndCacheMetrics.func1
        /Users/apimenteldasilvatopi/github/ibm-spectrum-exporter/spectrumservice/client.go:107
2020-07-03T08:06:54.018Z        ERROR   spectrumservice/client.go:199   Error collecting volumes metrics for storage blanked-out-before-sharing-publically. [{ " r e s u l t " : { " m s g I d " : " B P C U I 0 0 9 9 E " , " t e x t " : " T h e   s t o r a g e   r e s o u r c e   i s   n o t   a v a i l a b l e . " , " t i m e " : " J u l   3 ,   2 0 2 0 ,   1 0 : 0 6 : 5 4 " , " t y p e " : " E " } }]lient
github.com/topine/ibm-spectrum-exporter/spectrumservice.(*Client).CollectStorageMetrics
        /Users/apimenteldasilvatopi/github/ibm-spectrum-exporter/spectrumservice/client.go:199
github.com/topine/ibm-spectrum-exporter/spectrumservice.(*Client).CollectAndCacheMetrics.func1
        /Users/apimenteldasilvatopi/github/ibm-spectrum-exporter/spectrumservice/client.go:107
2020-07-03T08:06:54.018Z        INFO    ibm-spectrum-exporter/main.go:128       Finished collecting metrics
2020-07-03T08:06:54.018Z        INFO    ibm-spectrum-exporter/main.go:89        Scheduler started with success with interval @every 5m

2020-07-03T08:06:54.018Z        INFO    collector/collector.go:103      Starting IBM Spectrum collect.
2020-07-03T08:06:54.019Z        INFO    ibm-spectrum-exporter/main.go:117       Exporter started with Success.

The error from spectrum is according to the IBM manual:

If the URL is not used correctly, you might receive the following error message:
{"result":{"type":"E","msgId":"BPCUI0099E","time":"Apr 4, 2016 16:25:07","text":
"The storage resource is not available."}}

ref: https://www.ibm.com/support/knowledgecenter/SS5R93_5.3.6/com.ibm.spectrum.sc.doc/mgr_rest_api_retrieving_cli.html

Hello @johnnnnnnnnnnnnnn

The error happens when I am retrieving the volume metrics and the metric 1029 is not available for the volumes, instead of a empty value for the metric IBM Spectrum reply with the error:

https://spectrum:9569/srm/REST/api/v1/StorageSystems/12313/Volumes/Performance?metrics=803,1029

In fact I am collecting the same metrics for the Storage System and for the Volumes and it is wrong.

I will add another param to the config file to select if the metric will be requested for the Storage System and/or Volume.

Like this :

- ibm_spectrum_metric_id: 1029
  prometheus_name: storage_invalid_link_transmission_rate
  prometheus_help: The average number of times per second that an invalid transmission word was detected by the port while the link did not experience any signal or synchronization loss.
  target: 
    - storageSystem

- ibm_spectrum_metric_id: 803
  prometheus_name: storage_avg_read_io_ops_per_second
  prometheus_help: Average number of read operations per second (both sequential and non-sequential, if applicable), for a particular component over a particular time interval.
  target:
    - storageSystem
    - volume

What do you think ?

That is a good way of handling it.

With that said, is it possible to structure the data from a top level perspective by introducing a new category?

Currently I've built my metrics_config,yaml by accessing the Performance endpoint for storage systems and switches respectively then converting the output to the same format as in your example config. Adding a target parameter would work, but make my conversions harder. It understand if this is outside of the scope of your consideration (how I build my configs has nothing to do with your program). But what about something along these lines:

metrics:
  storage_systems_and_volumes:
    - ibm_spectrum_metric_id: 803
      prometheus_name: storage_avg_read_io_ops_per_second
      prometheus_help: Average number of read operations per second (both sequential and non-sequential, if applicable), for a particular component over a particular time interval.

  storage_systems:
    - ibm_spectrum_metric_id: 1029
      prometheus_name: storage_invalid_link_transmission_rate
      prometheus_help: The average number of times per second that an invalid transmission word was detected by the port while the link did not experience any signal or synchronization loss.

  switches:
    - ibm_spectrum_metric_id: 860
      prometheus_name: storage_switcher_avg_total_mb_per
      prometheus_help: Average number of mebibytes (2^20 bytes) transferred per second.

  pools:
    properties:
      - property_name: Capacity
        prometheus_name: storage_usable_capacity_GiB
        prometheus_help: Usable Capacity

Sure, the effort is the same and structured like you say is easier to understand.

I will do change during this week.

Regarding the the metrics available at the Volume level, you can retrieve with this endpoint:

https://spectrum:9569/srm/REST/api/v1/StorageSystems/1234/Volumes/12345/Performance

Thanks

Thanks for the fix, I've updated our environment and it works fine with the default config. I will add more metrics in the coming days.

Again, Thanks!