Example Prometheus Monitoring
Goal
Setup monitoring with Prometheus and Grafana.
Steps
- Run sample server:
npm install
andnode server
- Run Prometheus: see below
- Visit your running Prometheus and run queries
- Run Grafana: see below
- Add Prometheus data source (Url:
http://localhost:9090
, Access:direct
) - Import
grafana-dashboard.json
dashboard - Create your own dashboard from the Prometheus queries
Requirements
- Docker
Run
Modify: /prometheus-data/prometheus.yml
, replace 192.168.0.10
with your own host machine's IP.
Host machine IP address: ifconfig | grep 'inet 192'| awk '{ print $2}'
docker run -p 9090:9090 -v "$(pwd)/prometheus-data":/prometheus-data prom/prometheus -config.file=/prometheus-data/prometheus.yml
Open Prometheus: http://http://localhost:9090
Example Queries
Throughput
Error rate
Range[0,1]: number of 5xx requests / total number of requests
sum(increase(http_request_duration_ms_count{code=~"^5..$"}[1m])) / sum(increase(http_request_duration_ms_count[1m]))
Request Per Minute
sum(rate(http_request_duration_ms_count[1m])) by (service, route, method, code) * 60
Response Time
Apdex
Apdex score approximation:
100ms
target and 300ms
tolerated response time
(
sum(rate(http_request_duration_ms_bucket{le="100"}[1m])) by (service)
+
sum(rate(http_request_duration_ms_bucket{le="300"}[1m])) by (service)
) / 2 / sum(rate(http_request_duration_ms_count[1m])) by (service)
Note that we divide the sum of both buckets. The reason is that the histogram buckets are cumulative. The le="100" bucket is also contained in the le="300" bucket; dividing it by 2 corrects for that. - Prometheus docs
95th Response Time
histogram_quantile(0.95, sum(rate(http_request_duration_ms_bucket[1m])) by (le, service, route, method))
Median Response Time:
histogram_quantile(0.5, sum(rate(http_request_duration_ms_bucket[1m])) by (le, service, route, method))
Average Response Time
avg(rate(http_request_duration_ms_sum[1m]) / rate(http_request_duration_ms_count[1m])) by (service, route, method, code)
Memory Usage
Average Memory Usage
In Megabyte.
avg(nodejs_external_memory_bytes / 1024) by (service)
Reload config
Necessary when you modified prometheus-data.
curl -X POST http://localhost:9090/-/reload
Prometheus Data
avg(rate(http_request_duration_ms_sum[1m]) / rate(http_request_duration_ms_count[1m])) by (service, route, method, code)
Prometheus Alerts
States of active alerts: pending
, firing
Grafana
Run
docker run -i -p 3000:3000 grafana/grafana
Open Grafana: http://http://localhost:3000
Username: admin
Password: admin
Grafana Dashboard to import: /grafana-dashboard.json
Grafana Dashboard
Acknowledgements
This example is sponsored by Trace by RisingStack.