Error calling humanizeDuration: can't convert int to float
Closed this issue · 4 comments
We've noticed the bosh disk predict alerts that use humanizeDuration seem to now have errors after going from v26.5.0 to v26.6.0.
<error expanding template: error executing template __alert_BOSHJobEphemeralDiskPredictWillFill: template: __alert_BOSHJobEphemeralDiskPredictWillFill:1:297: executing "__alert_BOSHJobEphemeralDiskPredictWillFill" at <humanizeDuration 14400>: error calling humanizeDuration: can't convert int to float>
We have alertmanager pushing alerts to slack, and the above is what was posted to slack.
We have been finding in our environment whenever we change an instance stemcell from xenial to bionic, the bosh disk predict alerts always immediately fire for a little while, so these ones were pretty obvious to us, but it probably affects all alerts which use humanize*
with an integer.
We were previously just using the default values for these properties, but as a workaround I have found setting them in an ops file like this to fix the issue:
- type: replace
path: /instance_groups/name=prometheus2/jobs/name=bosh_alerts/properties?/bosh_alerts
value:
job_predict_system_disk_full:
predict_time: "14400.0"
job_predict_ephemeral_disk_full:
predict_time: "14400.0"
job_predict_persistent_disk_full:
predict_time: "14400.0"
This ensures that the values in /var/vcap/jobs/bosh_alerts/bosh_system_predict.alerts.yml
are written as floats instead of integers i.e.
...
annotations:
summary: "BOSH Job `{{$labels.environment}}/{{$labels.bosh_name}}/{{$labels.bosh_deployment}}/{{$labels.bosh_job_name}}/{{$labels.bosh_job_index}}` will run out of ephemeral disk in {{humanizeDuration 14400.0}}"
description: "BOSH Job `{{$labels.environment}}/{{$labels.bosh_name}}/{{$labels.bosh_deployment}}/{{$labels.bosh_job_name}}/{{$labels.bosh_job_index}}` ephemeral disk will be used more than 80% in {{humanizeDuration 14400.0}}"
...
I found some discussion upstream in prometheus around having their templating functions now support ints as well, so it might not be an issue after the next prometheus bump? prometheus/prometheus#9679
Hi @bg-govau ,
with next release we will bump to latest prometheus version and check if that will solve the issue.
Otherwise we need to adjust the default values used.
Hi @bg-govau
please check if v26.7.0 does fix the issue.
Thanks @benjaminguttmann-avtq , I'll give it a go, hopefully next week.
I updated to v26.7.0 and removed the ops file workaround. Alerts now look fine 🎉 Thanks for that.