WatchDog
WatchDog is a monitoring service that automatically detects Load Balancer resources in a Kubernetes cluster and exposes metrics related to these Load Balancers. The service uses OpenTelemetry to instrument the metrics and Prometheus to expose them.
Overview
The Service WatchDog runs as a standalone service within the Kubernetes cluster. It periodically queries the Kubernetes API server to get a list of all services in the specified namespace. It then filters the list to include only services of type LoadBalancer. For each LoadBalancer service found, it retrieves the LoadBalancer IP and name and generates metrics with custom labels. These metrics are then exposed via a Prometheus endpoint, making them available for monitoring and visualization in Grafana or other monitoring tools.
How It Works
WatchDog consists of two main components:
-
main.go
: This is the main entry point of the Service WatchDog. It initializes the OpenTelemetry configuration, retrieves the list of Load Balancers using thek8s.go
helper functions, and generates the metrics using theWithMetricsLoadBalancer
function. -
k8s.go
: This file contains helper functions to interact with the Kubernetes API and retrieve the Load Balancers' information. TheGetNamespace
function gets the namespace where the service is deployed. TheListServices
function retrieves the list of all services in the namespace, and theGetLoadBalancers
function filters the list to include only LoadBalancer services and returns a list ofLoadBalancer
structs. -
metrics.go
: This file contains the OTEL functions, create metrics, expose, etc..
Metrics
WatchDog generates the following custom metrics for each LoadBalancer:
load_balancer
: This metric represents the LoadBalancer resource and includes the following labels:service_name
: The service name. In this case, it is set to "watchdog."load_balancer_name
: The name of the LoadBalancer service.load_balancer_ip
: The IP address of the LoadBalancer.namespace
: The namespace in which the LoadBalancer is deployed.value
: The value of the metric. In this example, it is set to 1, but it can be customized to represent different load balancing states.
Installation and Configuration
-
Clone this repository and navigate to the root folder.
-
Deploy the Service WatchDog to the Kubernetes cluster:
kubectl apply -k deployment/overlays/local_dev
Access the Prometheus and Grafana dashboards to view and analyze the metrics exposed by the Service WatchDog.
Monitoring and Visualization
WatchDog exposes the custom metrics through the Prometheus endpoint. You can use Grafana to connect to Prometheus and create custom dashboards to visualize the LoadBalancer metrics.
To access the Prometheus and Grafana dashboards and view the metrics, follow these steps:
- Access the Prometheus dashboard:
- Open a web browser and navigate to the Prometheus server's URL (e.g.,
http://prometheus-server:9090
). - In the Prometheus web interface, you can explore and query the metrics collected by the Service WatchDog.
- Access the Grafana dashboard:
- Open a web browser and navigate to the Grafana server's URL (e.g.,
http://grafana-server:3000
). - Log in to Grafana using your credentials.
- Create a new dashboard or import an existing one to visualize the LoadBalancer metrics from Prometheus.
- Use the
load_balancer
metric and its labels to filter and display the relevant information.
Customizing dashboards and setting up alerts in Grafana will help you monitor the performance and health of your LoadBalancer resources effectively.
Customization
You can customize the Service WatchDog to suit your specific needs. For example, you can change the namespace where the service looks for LoadBalancer resources, adjust the metrics' names and labels, or modify the metrics' values based on your load balancing states.
Jose Ramon Mañes