jaredledvina/sensu-go-ansible

Can't Add Command Line Arguments Once Check is Configured

Closed this issue · 5 comments

First up, just to say, the sensu_go_plugin library looks great - it's helped me get some monitoring on some stuff that's really in need of it ;-)

Onto my observation: Once a check has been configured, it can't be changed without removing it and re-inserting it. As a test case, doing this with ad-hoc:

ansible myserver -m sensu_go_check -a 'name="fred" command="check_cpu.rb" subscriptions="test" state=present'

This returns the check configuration, which looks correct:

myserver | SUCCESS => {
    "changed": true, 
    "checks": [
        {
           "check_hooks": null, 
            "command": "check_cpu.rb", 
            "env_vars": null, 
            "handlers": [], 
            "high_flap_threshold": 0, 
            "interval": 60, 
            "low_flap_threshold": 0, 
            "metadata": {
                "name": "fred", 
                "namespace": "default"
            }, 
            "output_metric_format": "", 
            "output_metric_handlers": [], 
            "proxy_entity_name": "", 
            "publish": true, 
            "round_robin": false, 
            "runtime_assets": [], 
            "stdin": false, 
            "subdue": null, 
            "subscriptions": [
                "test"
            ], 
            "timeout": 0, 
            "ttl": 0
        }
    ], 
    "message": "OK"
}

All fine so far. However, an attempt to update it with:

ansible myserver -m sensu_go_check -a 'name="fred" command="check_cpu.rb -w 80 -c 90" subscriptions="test" state=present'

Returns:

myserver | SUCCESS => {
    "changed": false, 
    "message": "Check already defined in Sensu"
}

If the above check is removed with a state=absent and then re-added using the same command as above then it is correctly updated in both the console output and in the Sensu web UI.

I had a bit of a look at the library code - I'm not entirely sure how to approach any possible solution. Since a 'list' operation returns everything the server knows, I guess it could just be a matter of stepping through the list output looking for the current 'name'. If found, either update it (if the API allows) or remove and re-add if not...? It could make the whole thing pretty slow, although each 'add' would update the list. Another option would be to try to add it anyway, and then get the list if you get the 'Check already defined in Sensu' message. Lastly, maybe it's an upstream change to the Sensu API to do the update if we're asking for a slightly different check?

You may want to try files (instead of ad-hoc commands). I'm sure what I've done below is not best-practice, but until I see otherwise I will use it. I've added to this playbook so that ansible creates /etc/sensu/checks and pushes out all of the unique checks there.
Example: /etc/sensu/checks/http-status.yaml

type: CheckConfig
metadata:
  namespace: default
  name: status
spec:
  command: /usr/lib/nagios/plugins/check_http -H 10.0.0.1 -u /healthy
  subscriptions:
  - entity:UE2-DEV-SENSU-VM
  publish: true
  interval: 60
  handlers:
  - monitoring

Then add or update the check with sensuctl. I created a shell script for this, that may or may not be necessary (wildcards could be supported)
for check in `ls /etc/sensu/checks/*.yaml` do /usr/bin/sensuctl create -f $check done
Just run that command (or script) with ansible when you push out your checks. It updates just fine for me.
Happy monitoring!

I see what you're getting at - that's an approach I've used elsewhere for something-or-other.

Depending on your shell script, you may have the same problem - that is, you can't change a check config. I tried a bit on the command line:

sensuctl check create fred --command check-cpu.rb --subscriptions test --interval 60

...says "OK" and indeed the check is created. However, running it again with a modification:

sensuctl check create fred --command "check-cpu.rb -w 75 -c 85" --subscriptions test --interval 60

...fails with:

Error: resource already exists

...and a shell exit code of 1. Thus, you could look for non-zero return code and then delete the check and re-create it. So long as you only delete-and-recreate when Ansible tells you to do so (ie. when the file changes), then you would work around the problem I've reported.

Also, since the underlying sensuctl can't do an update-if-already-exists, then it somewhat defines how this problem might be solved in the library (assuming an upstream change wasn't the best choice).

Try the yaml file - it works for me (creates and updates)

Hey @coofercat!

Thanks for the issue! I think with the way I have currently created that initial attempt at a sensu_go_check module (https://github.com/jaredledvina/sensu-go-ansible/blob/master/library/sensu_go_check.py) the list + looping would be needed.

The logic overall isn't too crazy, might make sense to do that here: https://github.com/jaredledvina/sensu-go-ansible/blob/master/library/sensu_go_check.py#L147-L159

Tangentially related, I'm not totally sure about the use of sensuctl to wrap their API. It feels pretty brittle to me and will probably completely break if the args ever change. I was thinking about re-writing that module to leverage https://docs.sensu.io/sensu-go/5.1/api/checks/ instead which get's us https://docs.sensu.io/sensu-go/5.1/api/checks/#checkscheck-put-specification which should be perfect for this use case.

Anyway, thanks again, lemme know what you think.

EDIT:
Yeah, using sensuctl manually via a playbook works as well like @dandebiase points out. The provided library/module in this role was really just my first shot at writing one. While it was working, it's not wonderful by any means. Feel free to use any other method to deploy/configure Sensu Go. I think, personally, this role still is solid at installing and configuring the initial set of things.

3 months later, I found free time!

Opened #93 which migrates the existing module off of using sensuctl and directly uses the Sensu Go API!

If folks want to test it, go for it and let me know if it works! It's very much a v0 but, I think the user experience is waaay nicer.