- Overview
- Metrics naming
- Configuration
- Using Redis cache
- Logging
- Prometheus configuration
- Installing
- Running
- System requirements
- License
The monitor-exporter utilises ITRS, former OP5, Monitor's API to fetch host and service-based performance data and publish it in a way that lets Prometheus scrape the performance data and state as metrics.
Benefits:
- Enable advanced queries and aggregation on time series
- Prometheus based alerting rules
- Grafana graphing
- Take advantage of metrics already collected by Monitor, without rerunning checks
- Collect hosts and services performance data and state and translate to Prometheus metrics
This solution is a perfect gateway for any Monitor users that would like to start using Prometheus and Grafana.
Metrics that are scraped with the monitor-exporter will have the following naming structure:
monitor_<check_command>_<perfname>_<unit>
Unit is only added if it exists for the performance data
For example the check command check_ping
will result in two metrics:
monitor_check_ping_rta_seconds
monitor_check_ping_pl_ratio
In Monitor the host also have a check to verify the state of the host. The metric name is always called monitor_check_host_alive
.
If this check as multiple performance values they will be reported as individual metrics, e.g.
monitor_check_host_alive_pkt{hostname="foo.com", environment="production", service="isalive"} 1
monitor_check_host_alive_rta{hostname="foo.com", environment="production", service="isalive"} 2.547
monitor_check_host_alive_pl_ratio{hostname="foo.com", environment="production", service="isalive"} 0.0
Service label will always be
isalive
State metrics is reported for both hosts and services. State metrics is reported as value 0 (okay), 1 (warning), 2 (critical) and 4 (unknown).
For hosts the metric name is:
monitor_host_state
For services the metric name is:
monitor_service_state
The monitor-exporter adds a number of labels to each metric:
- hostname - is the
host_name
in Monitor - service - is the
service_description
in Monitor - downtime - if the host or service is currently in a downtime period - true/false. If the host is in downtime its services are also in downtime.
- address - the hosts real address
- acknowledged - is applicable if a host or service is in warning or critical and have been acknowledged by operations - 0/1 where 1 is acknowledged.
Optionally the monitor-exporter can be configured to pass all or specific custom variables configured in Monitor as labels Prometheus.
Any host based custom variables that is used as labels is also set for its services.
Labels created from custom variables are all transformed to lowercase.
As described above, the default naming of the Prometheus name is:
monitor_<check_command>_<perfname>_<unit>
For some check commands this does not work well like for the self_check_by_snmp_disk_usage_v3
check command where the
perfname are the unique mount paths.
For checks where the perfname is defined depending on a specific name, you can change it so the perfname becomes a
label instead.
This is defined in the configuration like:
perfnametolabel:
# The command name
self_check_by_snmp_disk_usage_v3:
# the label name to be used
label_name: disk
check_disk_local_mb:
label_name: local_disk
So if the check command is self_check_by_snmp_disk_usage_v3
, the Prometheus metrics will have a format like:
monitor_self_check_by_snmp_disk_usage_v3_bytes{hostname="monitor", service="Disk usage /", disk="/_used"} 48356130816.0
If we did not make this transformation, we would get the following:
monitor_self_check_by_snmp_disk_usage_v3_slash_used_bytes{hostname="monitor", service="Disk usage /"} 48356130816.0
Which is bad since we get specific metric name from the perfname.
Please be aware of naming conventions for perfname and services, especially when they include a name depending on what is checked like a mountpoint or disk name.
All configuration is made in the config.yml file.
Example:
# Port can be overridden by using -p if running development flask
# This is the default port assigned at https://github.com/prometheus/prometheus/wiki/Default-port-allocations
#port: 9631
op5monitor:
# The url to the Monitor server
url: https://monitor.example.com
user: monitor
passwd: monitor
metric_prefix: monitor
# Example of custom vars that should be added as labels and how to be translated
host_custom_vars:
# Specify which custom_vars to extract from Monitor
- env:
# Name of the label in Prometheus
label_name: environment
- site:
label_name: dc
# This section enable that for specific check commands the perfdata metrics name will not be part of the
# Prometheus metrics name, and is instead moved to a label
# E.g for the self_check_by_snmp_disk_usage_v3 command the perfdata name will be set to the label disk like:
# monitor_self_check_by_snmp_disk_usage_v3_bytes{hostname="monitor", service="Disk usage /", disk="/_used"}
perfnametolabel:
# The command name
self_check_by_snmp_disk_usage_v3:
label_name: disk
logger:
# Path and name for the log file. If not set, send to stdout
logfile: /var/tmp/monitor-exporter.log
# Log level
level: INFO
When running with gunicorn the port is defined by gunicorn
If you have a large Monitor configuration, the load of the Monitor server can get high when collecting host and service data over the api with a high rate. We strongly recommend that you instead collect host and service data in a batch and store it in a redis cache. The interval of the batch collecting is configurable, but considering that most service checks in Monitor are often done in 5 minutes interval, collecting every minute should be more than enough.
To use caching just add this to your config.yml
:
cache:
# Use redis for cache - future may support others
# Values below is the default
redis:
# redis host
host: localhost
# redis port
port: 6379
# the auth string used in redis
#auth: secretstuff
# the redis db to use
db: 0
# The interval to collect data from Monitor in secoends
interval: 60
# The time to live for the stored Monitor objects in the redis cache
ttl: 300
Redis must installed on some host on the network and be accessible from the server running monitor-exporter
The log stream is configure in the above config. If logfile
is not set the logs will go to stdout.
Logs are formatted as json so it's easy to store logs in log servers like Loki and Elasticsearch.
Prometheus can be used with static configuration or with dynamic file discovery using the project monitor-promdiscovery
Please add the the job to the scrape_configs in prometheus.yml.
The target is the
host_name
configured in Monitor.
scrape_configs:
- job_name: 'op5monitor'
metrics_path: /metrics
static_configs:
- targets:
- monitor
- google.se
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9631
scrape_configs:
- job_name: 'op5monitor'
scrape_interval: 1m
metrics_path: /metrics
file_sd_configs:
- files:
- 'sd/monitor_sd.yml'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9631
-
Clone the git repo.
-
Install dependencies
pip install -r requirements.txt
-
Build a distribution
python setup.py sdist
-
Install locally
pip install dist/monitor-exporter-X.Y.Z.tar.gz
python -m monitor_exporter -f config.yml
The switch -p enable setting of the port.
The are a number of ASGI containers that can be can use to deploy monitor-exporter. The dependency for these are not included in the distribution.
First install the guincorn dependency into the python environment.
pip install gunicorn
pip install uvicorn
Running with the default config.yml. The default location is current directory.
gunicorn --access-logfile /dev/null -w 4 -k uvicorn.workers.UvicornWorker "wsgi:create_app()"
Set the path to the configuration file.
gunicorn --access-logfile /dev/null -w 4 -k uvicorn.workers.UvicornWorker "wsgi:create_app('/etc/monitor-exporter/config.yml')"
Port for gunicorn is default 8000, but can be set with -b, e.g.
-b localhost:9631
Check if the exporter is working.
curl -s http://localhost:9631/health
Get metrics for a host where target
is a host using the same host_name
in Monitor
curl -s http://localhost:9631/metrics?target=foo.com
Python 3.6
For required packages, please review requirements.txt
The monitor-exporter is licensed under GPL version 3.