maxwo/snmp_notifier

alert attached to the resolution trigger

Opened this issue · 4 comments

What did you do?

I use 2 templates to generate 2 fields to allow automatic alarm resolution :

_ a default one, to provide a status FAULT or OK

{{- if .Alerts -}}
FAULT
{{ else -}}
OK
{{- end -}}

_ another one to provide the alarm information, and as any alarming system (including for example the prometheus alarm manager and others) it require to have unique "ID" to match a fault, and when it's solved.

{{ range $severity, $alerts := (groupAlertsByLabel .Alerts "severity") -}}
{{- range $index, $alert := $alerts }}
{{ $alert.Labels.severity }};{{ $alert.Labels.instance }};{{ $alert.Labels.job }};{{ $alert.Labels.alertname }};{{ $alert.Annotations.summary }};{{ $alert.Annotations.description }}
{{ end }}
{{ end }}

In my object, i got a CVS format string with the alertname, the instance, the job, the description and the summary
So, the SNMP alarm system can use the alertname+instance to identify uniquely the alarm

What did you expect to see?

the alarms firing and resolving must be fairly identical, and only the description must change : FAULT or OK
and the extra field allow to get in case of firing the description and summary and instance to document the alarm and the information will allow to match to firing and the resolved automatically

What did you see instead? Under which circumstances?

in case of alarms firing, no issue, everything is filled
in case of alarms resolved, the extra field is empty as

Environment

  • System information:

    it's the docker image

  • SNMP notifier version:

    maxwo/snmp-notifier:latest as per today, so 1.5 I suppose

Note : I tested with a modified version, build locally, with the code alert_parser.go, line 69 removed (and syntax corrected)
and it was then working properly, and logically meaning every alarms are treated equals

snmp_notifier, version 1.5.0 (branch: main, revision: 9344558)
build user: tecnotree@centos
build date: 20240913-16:08:54
go version: go1.22.5 (Red Hat 1.22.5-2.el9)
platform: linux/amd64
tags: netgo

  • Alertmanager version:

    prom/alertmanager:latest as per today, so it's

Version Information
Branch:
HEAD
BuildDate:
20240228-11:51:20
BuildUser:
root@22cd11f671e9
GoVersion:
go1.21.7
Revision:
0aa3c2aad14cff039931923ab16b26b7481783b5
Version:
0.27.0

  • Prometheus version:

Not valid, as the alarms are coming from Grafana here

  • Alertmanager command line:

  • SNMP notifier command line:

./snmp_notifier --snmp.trap-description-template=description-template.tpl --snmp.extra-field-template=4=object-template.tpl --snmp.version=V2c --snmp.destination=ss-vip:162 --snmp.community=tecnomen --snmp.timeout=5s --web.listen-address=:9465

  • Prometheus alert file:

  • Logs:

Thanks for your detailed message.

If after your modification of the parser, it worked as you expected, I propose you to use the .DeclaredAlerts variable in your template, which includes all the alerts, firing or not.

Hi there,

I'm looking at the declaredAlerts, as your code is more important than mine, and I'm still not having any result.
is there a way to have all the information no matter if it's firing or resolving ?

right now, you code is clear :

alert_parser.go :

            _alertGroups[key].DeclaredAlerts = append(alertGroups[key].DeclaredAlerts, alert)
	if alert.Status == "firing" {
		err = alertParser.addAlertToGroup(alertGroups[key], alert)
		if err != nil {
			return nil, err
		}
	}_

only the firing alert got parser and completed with the labels, which an be used to passed into the SNMP alerts (via new OID)

I'm gonna do some checks, as the default template seems to work well:

{{ len .Alerts }}/{{ len .DeclaredAlerts }} alerts are firing:

And it always display the "2/4 alerts are firing" for instance.

How about something like:

{{- range .DeclaredAlerts }}
{{- .Labels.severity }};{{ .Status }}{{ .Labels.instance }};{{ .Labels.job }};{{ .Labels.alertname }};{{ .Annotations.summary }};{{ .Annotations.description }}
{{ end }}

?

so sorry for the delay. I have been busy with others tasks.

basically, I can provide explanations only for most of the snmp system ,but not all.

you prefer to have 2 snmp alarms :

  • firing the alarm
  • resolving the alarm

the mapping is mostly based on different OID and/or fields to provide the matching.
in the same way as the Prometheus alert manager over the alarms (nothing new)

so, when actually alarm are send, you need to have a "constance" in the alarm format, to allow the third party SNMP system to recognize them.

and example :
_ OID : xxx
status : firing
severity : WARN
server: server01
alarm: CPU over 80% - server01
job: node-exporter-job
_ OID : xxx
status : resolved
severity : WARN
server: server01
alarm: CPU over 80% - server01
job: node-exporter-job

the SNMP system can do the mapping and cancel the alarm.

I'm working on doing more sample now and I'll send asap some samples