/kennel

Keep datadog monitors/dashboards/etc in version control, avoid chaotic management via UI

Primary LanguageRuby

Kennel

Keep datadog monitors/dashboards/etc in version control, avoid chaotic management via UI.

  • Documented, reusable, automated, and searchable configuration
  • Changes are PR reviewed and auditable
  • Good defaults like no-data / re-notify are preselected
  • Reliable cleanup with automated deletion

Install

  • create a new private kennel repo for your organization (do not fork this repo)
  • use the template folder as starting point:
    git clone git@github.com:your-org/kennel.git
    git clone git@github.com:grosser/kennel.git seed
    mv seed/teamplate/* kennel/
    cd kennel && git add . && git commit -m 'initial'
  • add a basic projects and teams so others can copy-paste to get started
  • setup travis build for your repo
  • uncomment .travis.yml section for automated github PR feedback and datadog updates on merge
  • follow Setup in your repos Readme.md

Structure

  • projects/ monitors/dashboards/etc scoped by project
  • teams/ team definitions
  • parts/ monitors/dashes/etc that are used by multiple projects
  • generated/ projects as json, to show current state and proposed changes in PRs

Workflows

Adding a team

# teams/my_team.rb
module Teams
  class MyTeam < Kennel::Models::Team
    defaults(
      slack: -> { "my-alerts" },
      email: -> { "my-team@example.com" }
    )
  end
end

Adding a new monitor

  • use datadog monitor UI to create a monitor
  • get the id from the url
  • RESOURCE=monitor ID=12345 bundle exec rake kennel:import
  • see below

Updating an existing monitor

  • find or create a project in projects/
  • add a monitor to parts: [ list
# projects/my_project.rb
class MyProject < Kennel::Models::Project
  defaults(
    team: -> { Teams::MyTeam.new }, # use existing team or create new one in teams/
    parts: -> {
      [
        Kennel::Models::Monitor.new(
          self,
          id: -> { 123456 }, # id from datadog url, not necessary when creating a new monitor
          type: -> { "query alert" },
          kennel_id: -> { "load-too-high" }, # make up a unique name
          name: -> { "Foobar Load too high" }, # nice descriptive name that will show up in alerts and emails
          message: -> {
            # Explain what behavior to expect and how to fix the cause. Use #{super()} to add team notifications.
            <<~TEXT
              Foobar will be slow and that could cause Barfoo to go down.
              Add capacity or debug why it is suddenly slow.
              #{super()}
            TEXT
          },
          query: -> { "avg(last_5m):avg:system.load.5{hostgroup:api} by {pod} > #{critical}" }, # replace actual value with #{critical} to keep them in sync
          critical: -> { 20 }
        )
      ]
    }
  )
end
  • bundle exec rake plan update to existing should be shown (not Create / Delete)
  • alternatively: bundle exec rake generate to only update the generated json files
  • review changes then git commit
  • make a PR ... get reviewed ... merge
  • datadog is updated by travis

Adding a new dashboard

  • go to datadog dashboard UI and click on New Dashboard to create a dashboard
  • get the id from the url
  • RESOURCE=dashboard ID=abc-def-ghi bundle exec rake kennel:import
  • see below

Updating an existing dashboard

  • find or create a project in projects/
  • add a dashboard to parts: [ list
class MyProject < Kennel::Models::Project
  defaults(
    team: -> { Teams::MyTeam.new }, # use existing team or create new one in teams/
    parts: -> {
      [
        Kennel::Models::Dashboard.new(
          self,
          id: -> { "abc-def-ghi" }, # id from datadog url, not needed when creating a new dashboard
          title: -> { "My Dashboard" },
          description: -> { "Overview of foobar" },
          template_variables: -> { ["environment"] }, # see https://docs.datadoghq.com/api/?lang=ruby#timeboards
          kennel_id: -> { "overview-dashboard" }, # make up a unique name
          layout_type: -> { "ordered" },
          definitions: -> {
            [ # An array or arrays, each one is a graph in the dashboard, alternatively a hash for finer control
              [
                # title, viz, type, query, edit an existing graph and see the json definition
                "Graph name", "timeseries", "area", "sum:mystats.foobar{$environment}"
              ],
              [
                # queries can be an Array as well, this will generate multiple requests
                # for a single graph
                "Graph name", "timeseries", "area", ["sum:mystats.foobar{$environment}", "sum:mystats.success{$environment}"],
                # add events too ...
                events: [{q: "tags:foobar,deploy", tags_execution: "and"}]
              ]
            ]
          }
        )
      ]
    }
  )
end

Skipping validations

Some validations might be too strict for your usecase or just wrong, please open an issue and to unblock use the validate: -> { false } option.

Linking with kennel_ids

To link to existing monitors via their kennel_id

  • Screens uptime widgets can use monitor: {id: "foo:bar"}
  • Screens alert_graph widgets can use alert_id: "foo:bar"

Debugging changes locally

  • rebase on updated master to not undo other changes
  • figure out project name by converting the class name to snake-case
  • run PROJECT=foo bundle exec rake kennel:update_datadog to test changes for a single project

Listing un-muted alerts

Run rake kennel:alerts TAG=service:my-service to see all un-muted alerts for a given datadog monitor tag.

Examples

Reusable monitors/dashes/etc

Add to parts/<folder>.

module Monitors
  class LoadTooHigh < Kennel::Models::Monitor
    defaults(
      name: -> { "#{project.name} load too high" },
      message: -> { "Shut it down!" },
      type: -> { "query alert" },
      query: -> { "avg(last_5m):avg:system.load.5{hostgroup:#{project.kennel_id}} by {pod} > #{critical}" }
    )
  end
end

Reuse it in multiple projects.

class Database < Kennel::Models::Project
  defaults(
    team: -> { Kennel::Models::Team.new(slack: -> { 'foo' }, kennel_id: -> { 'foo' }) },
    parts: -> { [Monitors::LoadTooHigh.new(self, critical: -> { 13 })] }
  )
end

Integration testing

rake play
cd template
rake kennel:plan

Then make changes to play around, do not commit changes and make sure to revert with a rake kennel:update after deleting everything.

To make changes via the UI, make a new free datadog account and use it's credentaisl instead.

Author

Michael Grosser
michael@grosser.it
License: MIT
Build Status