Example Prometheus Monitoring

Goal

Setup monitoring with Prometheus and Grafana.

Steps

Run sample server: npm install and node server
Run Prometheus: see below
Visit your running Prometheus and run queries
Run Grafana: see below
Add Prometheus data source (Url: http://localhost:9090, Access: direct)
Import grafana-dashboard.json dashboard
Create your own dashboard from the Prometheus queries

Requirements

Docker

Run

Modify: /prometheus-data/prometheus.yml, replace 192.168.0.10 with your own host machine's IP.
Host machine IP address: ifconfig | grep 'inet 192'| awk '{ print $2}'

docker run -p 9090:9090 -v "$(pwd)/prometheus-data":/prometheus-data prom/prometheus -config.file=/prometheus-data/prometheus.yml

Open Prometheus: http://http://localhost:9090

Example Queries

Throughput

Error rate

Range[0,1]: number of 5xx requests / total number of requests

sum(increase(http_request_duration_ms_count{code=~"^5..$"}[1m])) /  sum(increase(http_request_duration_ms_count[1m]))

Request Per Minute

sum(rate(http_request_duration_ms_count[1m])) by (service, route, method, code)  * 60

Response Time

Apdex

Apdex score approximation:
100ms target and 300ms tolerated response time

(
  sum(rate(http_request_duration_ms_bucket{le="100"}[1m])) by (service)
+
  sum(rate(http_request_duration_ms_bucket{le="300"}[1m])) by (service)
) / 2 / sum(rate(http_request_duration_ms_count[1m])) by (service)

Note that we divide the sum of both buckets. The reason is that the histogram buckets are cumulative. The le="100" bucket is also contained in the le="300" bucket; dividing it by 2 corrects for that. - Prometheus docs

95th Response Time

histogram_quantile(0.95, sum(rate(http_request_duration_ms_bucket[1m])) by (le, service, route, method))

Median Response Time:

histogram_quantile(0.5, sum(rate(http_request_duration_ms_bucket[1m])) by (le, service, route, method))

Average Response Time

avg(rate(http_request_duration_ms_sum[1m]) / rate(http_request_duration_ms_count[1m])) by (service, route, method, code)

Memory Usage

Average Memory Usage

In Megabyte.

avg(nodejs_external_memory_bytes / 1024) by (service)

Reload config

Necessary when you modified prometheus-data.

curl -X POST http://localhost:9090/-/reload

Prometheus Data

avg(rate(http_request_duration_ms_sum[1m]) / rate(http_request_duration_ms_count[1m])) by (service, route, method, code)

Prometheus Alerts

States of active alerts: pending, firing

Grafana

Run

docker run -i -p 3000:3000 grafana/grafana

Open Grafana: http://http://localhost:3000

Username: admin
Password: admin

Grafana Dashboard to import: /grafana-dashboard.json

Grafana Dashboard

Acknowledgements

This example is sponsored by Trace by RisingStack.

vudknguyen/example-prometheus-nodejs