Error message from spectrum breaks storage_metrics
johnnnnnnnnnnnnnn opened this issue · 4 comments
Adding certain metrics IDs to the metrics_config.yaml seems to break the collection of metrics. (see attachd logs)
The metric ID in this specific case is 1029. I added it to my metrics_conf.yaml like this:
metrics:
storage_systems:
- ibm_spectrum_metric_id: 1029
prometheus_name: storage_invalid_link_transmission_rate
prometheus_help: The average number of times per second that an invalid transmission word was detected by the port while the link did not experience any signal or synchronization loss.
I have no problems accesseing the metric by itself:
wget --no-check-certificate --load-cookies cookies.txt https://spectrumcontrol:9569/srm/REST/api/v1/StorageSystems/9669/Performance/1029
returns:
[
{
"metricDetails": {
"1029": {
"description": "The average number of times per second that an invalid transmission word was detected by the port while the link did not experience any signal or synchronization loss.",
"name": "Invalid Link Transmission Rate",
"units": "cnt\/s"
}
}
},
{
"current": [
{
"x": 1593676221744,
"y": null
},
{
.. blanked out for readability
{
"x": 1593762621744,
"y": null
}
],
"deviceId": 9669,
"deviceName": "blanked-out-before-sharing-publically",
"endTime": 1593762621744,
"label": "Invalid Link Transmission Rate",
"maxValue": 0.0,
"metricId": 1029,
"minValue": 0.0,
"precision": 4,
"resourceID": 9669,
"startTime": 1593676221744,
"units": "cnt\/s"
}
]
metric collection still works for switches and pools. But all storage metrics are gone., the logs gives the following information:
2020-07-03T08:06:50.092Z INFO ibm-spectrum-exporter/main.go:122 Starting to collect the metrics.
2020-07-03T08:06:50.165Z INFO spectrumservice/client.go:427 Number of Switches retrieved: 24
2020-07-03T08:06:50.266Z INFO spectrumservice/client.go:449 Metrics received for switch 3071 : 1 .
2020-07-03T08:06:50.312Z INFO spectrumservice/client.go:449 Metrics received for switch 3068 : 1 .
2020-07-03T08:06:50.340Z INFO spectrumservice/client.go:449 Metrics received for switch 125761 : 0 .
2020-07-03T08:06:50.381Z INFO spectrumservice/client.go:449 Metrics received for switch 3055 : 1 .
2020-07-03T08:06:50.430Z INFO spectrumservice/client.go:449 Metrics received for switch 3032 : 1 .
2020-07-03T08:06:50.476Z INFO spectrumservice/client.go:449 Metrics received for switch 3065 : 1 .
2020-07-03T08:06:50.524Z INFO spectrumservice/client.go:449 Metrics received for switch 3010 : 1 .
2020-07-03T08:06:50.572Z INFO spectrumservice/client.go:449 Metrics received for switch 3021 : 1 .
2020-07-03T08:06:50.617Z INFO spectrumservice/client.go:449 Metrics received for switch 3043 : 1 .
2020-07-03T08:06:50.650Z INFO spectrumservice/client.go:513 Number of Pools retrieved: 2
2020-07-03T08:06:50.663Z INFO spectrumservice/client.go:449 Metrics received for switch 3074 : 1 .
2020-07-03T08:06:50.717Z INFO spectrumservice/client.go:449 Metrics received for switch 3051 : 1 .
2020-07-03T08:06:50.768Z INFO spectrumservice/client.go:449 Metrics received for switch 3040 : 1 .
2020-07-03T08:06:50.821Z INFO spectrumservice/client.go:449 Metrics received for switch 3061 : 1 .
2020-07-03T08:06:50.869Z INFO spectrumservice/client.go:449 Metrics received for switch 3018 : 1 .
2020-07-03T08:06:50.915Z INFO spectrumservice/client.go:449 Metrics received for switch 3029 : 1 .
2020-07-03T08:06:50.948Z INFO spectrumservice/client.go:449 Metrics received for switch 125757 : 0 .
2020-07-03T08:06:50.978Z INFO spectrumservice/client.go:449 Metrics received for switch 125754 : 0 .
2020-07-03T08:06:51.026Z INFO spectrumservice/client.go:449 Metrics received for switch 3005 : 1 .
2020-07-03T08:06:51.070Z INFO spectrumservice/client.go:449 Metrics received for switch 3025 : 1 .
2020-07-03T08:06:51.117Z INFO spectrumservice/client.go:449 Metrics received for switch 3058 : 1 .
2020-07-03T08:06:51.162Z INFO spectrumservice/client.go:449 Metrics received for switch 3036 : 1 .
2020-07-03T08:06:51.209Z INFO spectrumservice/client.go:449 Metrics received for switch 3014 : 1 .
2020-07-03T08:06:51.255Z INFO spectrumservice/client.go:449 Metrics received for switch 3047 : 1 .
2020-07-03T08:06:51.282Z INFO spectrumservice/client.go:233 Number of Storage volumes retrieved: 1
2020-07-03T08:06:51.282Z INFO spectrumservice/client.go:237 Selecting storage system with regex : .*
2020-07-03T08:06:51.288Z INFO spectrumservice/client.go:449 Metrics received for switch 125764 : 0 .
2020-07-03T08:06:51.396Z INFO spectrumservice/client.go:269 Metrics received for storageID 9669 : 10 .
2020-07-03T08:06:53.451Z INFO spectrumservice/client.go:308 Volumes retrieved for Storage System 9669 : 515
2020-07-03T08:06:54.018Z ERROR spectrumservice/client.go:319 Error during volume metrics call.[{ " r e s u l t " : { " m s g I d " : " B P C U I 0 0 9 9 E " , " t e x t " : " T h e s t o r a g e r e s o u r c e i s n o t a v a i l a b l e . " , " t i m e " : " J u l 3 , 2 0 2 0 , 1 0 : 0 6 : 5 4 " , " t y p e " : " E " } }]lient
github.com/topine/ibm-spectrum-exporter/spectrumservice.(*Client).collectVolumeMetrics
/Users/apimenteldasilvatopi/github/ibm-spectrum-exporter/spectrumservice/client.go:319
github.com/topine/ibm-spectrum-exporter/spectrumservice.(*Client).CollectStorageMetrics
/Users/apimenteldasilvatopi/github/ibm-spectrum-exporter/spectrumservice/client.go:197
github.com/topine/ibm-spectrum-exporter/spectrumservice.(*Client).CollectAndCacheMetrics.func1
/Users/apimenteldasilvatopi/github/ibm-spectrum-exporter/spectrumservice/client.go:107
2020-07-03T08:06:54.018Z ERROR spectrumservice/client.go:199 Error collecting volumes metrics for storage blanked-out-before-sharing-publically. [{ " r e s u l t " : { " m s g I d " : " B P C U I 0 0 9 9 E " , " t e x t " : " T h e s t o r a g e r e s o u r c e i s n o t a v a i l a b l e . " , " t i m e " : " J u l 3 , 2 0 2 0 , 1 0 : 0 6 : 5 4 " , " t y p e " : " E " } }]lient
github.com/topine/ibm-spectrum-exporter/spectrumservice.(*Client).CollectStorageMetrics
/Users/apimenteldasilvatopi/github/ibm-spectrum-exporter/spectrumservice/client.go:199
github.com/topine/ibm-spectrum-exporter/spectrumservice.(*Client).CollectAndCacheMetrics.func1
/Users/apimenteldasilvatopi/github/ibm-spectrum-exporter/spectrumservice/client.go:107
2020-07-03T08:06:54.018Z INFO ibm-spectrum-exporter/main.go:128 Finished collecting metrics
2020-07-03T08:06:54.018Z INFO ibm-spectrum-exporter/main.go:89 Scheduler started with success with interval @every 5m
2020-07-03T08:06:54.018Z INFO collector/collector.go:103 Starting IBM Spectrum collect.
2020-07-03T08:06:54.019Z INFO ibm-spectrum-exporter/main.go:117 Exporter started with Success.
The error from spectrum is according to the IBM manual:
If the URL is not used correctly, you might receive the following error message:
{"result":{"type":"E","msgId":"BPCUI0099E","time":"Apr 4, 2016 16:25:07","text":
"The storage resource is not available."}}
Hello @johnnnnnnnnnnnnnn
The error happens when I am retrieving the volume metrics and the metric 1029 is not available for the volumes, instead of a empty value for the metric IBM Spectrum reply with the error:
https://spectrum:9569/srm/REST/api/v1/StorageSystems/12313/Volumes/Performance?metrics=803,1029
In fact I am collecting the same metrics for the Storage System and for the Volumes and it is wrong.
I will add another param to the config file to select if the metric will be requested for the Storage System and/or Volume.
Like this :
- ibm_spectrum_metric_id: 1029
prometheus_name: storage_invalid_link_transmission_rate
prometheus_help: The average number of times per second that an invalid transmission word was detected by the port while the link did not experience any signal or synchronization loss.
target:
- storageSystem
- ibm_spectrum_metric_id: 803
prometheus_name: storage_avg_read_io_ops_per_second
prometheus_help: Average number of read operations per second (both sequential and non-sequential, if applicable), for a particular component over a particular time interval.
target:
- storageSystem
- volume
What do you think ?
That is a good way of handling it.
With that said, is it possible to structure the data from a top level perspective by introducing a new category?
Currently I've built my metrics_config,yaml by accessing the Performance endpoint for storage systems and switches respectively then converting the output to the same format as in your example config. Adding a target parameter would work, but make my conversions harder. It understand if this is outside of the scope of your consideration (how I build my configs has nothing to do with your program). But what about something along these lines:
metrics:
storage_systems_and_volumes:
- ibm_spectrum_metric_id: 803
prometheus_name: storage_avg_read_io_ops_per_second
prometheus_help: Average number of read operations per second (both sequential and non-sequential, if applicable), for a particular component over a particular time interval.
storage_systems:
- ibm_spectrum_metric_id: 1029
prometheus_name: storage_invalid_link_transmission_rate
prometheus_help: The average number of times per second that an invalid transmission word was detected by the port while the link did not experience any signal or synchronization loss.
switches:
- ibm_spectrum_metric_id: 860
prometheus_name: storage_switcher_avg_total_mb_per
prometheus_help: Average number of mebibytes (2^20 bytes) transferred per second.
pools:
properties:
- property_name: Capacity
prometheus_name: storage_usable_capacity_GiB
prometheus_help: Usable Capacity
Sure, the effort is the same and structured like you say is easier to understand.
I will do change during this week.
Regarding the the metrics available at the Volume level, you can retrieve with this endpoint:
https://spectrum:9569/srm/REST/api/v1/StorageSystems/1234/Volumes/12345/Performance
Thanks
Thanks for the fix, I've updated our environment and it works fine with the default config. I will add more metrics in the coming days.
Again, Thanks!