oetiker/SmokePing

alert to= is not overridden

Closed this issue · 1 comments

In our alerts section, at the top level (config for all alert) we have

to = |/localpart0/cnvrtools/env/bin/smokedetector

Then down below that in a specific alert we are trying to override this with a separate script:
to = |/etc/smokeping/etc/smokedetector-no-merge

However when the alert fires, both scripts are being executed. Is this the expected behavior, that the to= config directives are additive instead of overriding those prior ones set? This seems different than the pattern in the rest of smokeping's configuration hierarchy, where leafier configuration nodes/sections override keys set prior.

We see in the system logs on the master server that both scripts are being executed from a single alert action (the former script and the latter script have debug output that we are logging with what arguments are being sent, and it is clearly for the same trigger)

Here are the system logs from the main server that show this occurring. Specifically the smokedetector script has no_merge and debug set to false, but the latter script ( wrapper around smokedetector that sets some extra args/flags) does.

Here is the smokeping master log message from itself about the alert being raised:
Dec 4 20:39:59 mon101.ord.neteng.core.cnvr.net smokeping: Alert NETENG-L-BACKBONE-TCP-LOSS was raised for backbone-quality-tests.backbone-quality-internal-ipv4.rtr-wan3-iad-cnvr-net-Vlan1187 [from mon101.iad.neteng.core.cnvr.net] loss: S, 100%, 100%(5/5) rtt: S, U, U prevmatch: 0 comment: Two failed connections in 10 polls

Then the invocation of smokedetector which we would have expected to be overridden/suppressed:
Dec 4 20:40:00 mon101.ord.neteng.core.cnvr.net smokeping: 2023-12-04 20:40:00,432 INFO: run - Args: {'debug': False, 'no_merge': False, 'Alert': 'NETENG-L-BACKBONE-TCP-LOSS', 'Target': 'backbone-quality-tests.backbone-quality-internal-ipv4.rtr-wan3-iad-cnvr-net-Vlan1187 [from mon101.iad.neteng.core.cnvr.net]', 'Loss Pattern': 'loss: S, 100%, 100%', 'RTT Pattern': 'rtt: S, U, U', 'Hostname': '10.130.252.233', 'Raised': 1}
Dec 4 20:40:00 mon101.ord.neteng.core.cnvr.net smokeping: 2023-12-04 20:40:00,432 INFO: run - Raising alert NETENG-L-BACKBONE-TCP-LOSS, key backbone-quality-tests.backbone-quality-internal-ipv4.rtr-wan3-iad-cnvr-net-Vlan1187

Then the run of the second script, a wrapper setting different arguments which we would have expected to override the first "to=" definition:
Dec 4 20:40:00 mon101.ord.neteng.core.cnvr.net smokeping: 2023-12-04 20:40:00,464 INFO: run - Args: {'debug': True, 'no_merge': True, 'Alert': 'NETENG-L-BACKBONE-TCP-LOSS', 'Target': 'backbone-quality-tests.backbone-quality-internal-ipv4.rtr-wan3-iad-cnvr-net-Vlan1187 [from mon101.iad.neteng.core.cnvr.net]', 'Loss Pattern': 'loss: S, 100%, 100%', 'RTT Pattern': 'rtt: S, U, U', 'Hostname': '10.130.252.233', 'Raised': 1}
Dec 4 20:40:00 mon101.ord.neteng.core.cnvr.net smokeping: 2023-12-04 20:40:00,464 INFO: run - Raising alert NETENG-L-BACKBONE-TCP-LOSS, key backbone-quality-tests.backbone-quality-internal-ipv4.rtr-wan3-iad-cnvr-net-Vlan1187 [from mon101.iad.neteng.core.cnvr.net]

No log messages were deleted between the ones listed above, you can see between the timestamps, order of events, and the content of the messages that the single alert trigger is executing both scripts.

If this is expected, how can we accomplish our goals? We only want to run the latter script, but the "to" key is mandatory in the top level of the alert section.

This issue has become stale and will be closed automatically within 7 days. Comment on the issue to keep it alive.