/watchdog-rs

Minimalist & multi-region network monitoring tool written in Rust

Primary LanguageRust

Network watchdog

A minimalist & multi-region network monitoring tool written in Rust

Getting started

Start by retrieving the latest release of the watchdog-rs project and create a YAML configuration config.yaml file as shown below. This configuration file will be used by the monitoring server to manage your configuration accross regions.

regions:
  - name: eu-west
    interval: 5s # Interval between network checks in a region
    threshold: 3 # Amount of region failures tolerated before alert
    groups:
      - name: default
        threshold: 4 # Amount of zone failures tolerted before alert
        mediums: telegram # Alert mediums
        tests:
          - http www.domain.be

Launch the main monitoring server that will be used by network regions to collect metrics. This service should be reachable by all network regions on port 3030.

# Define a set of environment variables that will be used
# by the server. In a production environment, this would
# be defined in a systemd service.
export TELEGRAM_TOKEN=x
export TELEGRAM_CHAT=x
export WATCHDOG_TOKEN=x

# Launch the main watchdog server on port 3030
watchdog server --config ./config.yaml

In a region, launch a network relay : a service that will retrieve the monitoring configuration from the server and start performing network tests. Each time a test is performed, the results will be pushed to the main monitoring server.

# Define a set of environment variables that will be used
# by the relay. In a production environment, this would
# be defined in a systemd service.
export WATCHDOG_ADDR=http://localhost:3030
export WATCHDOG_TOKEN=x

# Launch a watchdog network region relay
watchdog relay --region eu-west

On your workstation, use the CLI to get details about the monitoring state & ongoing incidents.

# Put these environment variables in a safe place on your
# workstation (watch out for shell history)
WATCHDOG_ADDR=http://localhost:3030
WATCHDOG_TOKEN=x

# Get the status of all your network regions & zones
watchdog status

Project goals

  • All-in-one solution : the project can be used to push metrics from network regions, aggregate all metrics in a server and see the results with a CLI
  • Minimal footprint : the major open-source monitoring solutions such as Prometheus and Zabbix offer a wide range of features, but consume a lot of resources (memory, network, ...). This project provides a set of components that are designed for low-resource consumption.
  • Default "push" approach : to enable multi-region network monitoring, the server defaults to a "push" approach. No need to have a fixed IP address or configure NAT in your router : all data is sent from the regions to the main public server.
  • Grafana-ready : while the project provides a CLI to view the status of your regions and track incidents, the project also integrates with Grafana.
  • Configuration as code : easily backup your configuration, keep track of changes with Git and share with other teams.

Roadmap

Docs

  • Production setup

CLI

  • Complete existing CLI commands
  • Server management (alerters, external IP, ...)
  • Bash autocompletion

Server

  • Alerters (SMS, email, webhook, script, ...)
  • Bandwidth control for relays

Relay

  • Auto-detect config changes
  • Add commands (TCP, UDP, ARP, ...)
  • Custom command (track UPS failure, ...)
  • Add metrics (HTTP latency, ...)

Docs & more

  • Real-life scenarios (Raspberry, ...)