1 CheatSheet: Prometheus

linkedin
github
slack


PRs Welcome

File me Issues or star this repo.

1.1 Prometheus Commands

NameCommand
Run prometheus server with dockerdocker run -p 9090:9090 prom/prometheus, http://localhost:9090/graph, http://localhost:9090/metrics
Run cadvisor to get local containers’ metricsdocker run -v /var/run:/var/run -v /sys:/sys -p 8080:8080 google/cadvisor, http://localhost:8080/metrics
Query metrics by api, instead of web consolecurl http://localhost:9090/api/v1/query?query=container_memory_usage_bytes
List all alerts of alertmanagercurl http://localhost:9093/api/v1/alerts
Prometheus tech stack footprintprometheus(350MB RAM), node-exporter(10MB), kube-state-metrics(20MB), alertmanager(15MB), grafana(30MB)
Example of client librariesLink: prometheus-python-example.py
Prometheus Online DemoLive demo from CloudAlchemy
Prometheus Config file/etc/prometheus/prometheus.yml Sections in conf: global, rule_files, scrape_configs

1.2 Prometheus Components

NameCommand
Prometheus serverScrapes and store time series data. It uses mainly pull model, instead of push.
Special-purpose exportersGet metrics for all kinds of services. e.g, Node Exporter, Blackbox Exporter, SNMP Exporter, JMX Exporter, etc
Client librariesInstrument application code.
AlertmanagerHandle alerts.
Push gatewaySupport short-lived jobs. Persist the most recent push of metrics from batch jobs.
ReferenceLink: Exporters And Integrations, Link: Default port allocations

https://raw.githubusercontent.com/dennyzhang/cheatsheet-prometheus-A4/master/prometheus-overview.png

1.3 Prometheus Metric Types

NameCommand
CounterIt only goes up (and resets), counts something. e.g, the number of requests served, tasks completed, or errors.
GaugeIt goes up and down, snapshot of state. e.g, temperatures or current memory usage, etc
SummaryIt samples observations, espeically over a sliding time window. e.g, rate(http_request_duration_seconds_sum[5m])
HistogramIt samples observations and counts them in configurable buckets.

1.4 Prometheus Concepts

NameSummary
TargetA target is the definition of an object to scrape.
JobA collection of targets with the same purpose.
InstanceA label that uniquely identifies a target in a job.
ExporterExpose metrics from a non-Prometheus format into a format Prometheus supports.
CollectorA part of an exporter that represents a set of metrics.
Handler
Rule

1.5 Kubernetes Metrics Targets & Samples

NameCommandSample Metrics
cadvisorhttp://$node_ip:10255/metrics/cadvisorLink: cadvisor-sample.txt
node-exporterhttp://$node_ip:9100/metricsLink: node-exporter-sample.txt
kubelethttp://$kubelet_ip:10255/metricsLink: kubelet-sample.txt
kube-dnshttp://$kube_dns_addon_ip:10054/metricsLink: kube-dns-sample.txt
kube-state-metrics http-metrichttp://$kube_state_metric_svc:8080/metricsLink: kube-state-metrics-http-sample.txt
kube-state-metrics telemetryhttp://$kube_state_metric_svc:8081/metricsLink: kube-state-metrics-telemetry-sample.txt
apiserverhttps://$api_server:443/metrics

https://raw.githubusercontent.com/dennyzhang/cheatsheet-prometheus-A4/master/prometheus-deployment.png

1.6 Prometheus PromQL Query

NameCommand
ReferenceLink: query
Find metric by name+job+groupsomemetric{job=”prometheus”,group=”canary”}
rate(apiserver_request_count{verb=”GET”, code=”200”}[1m])
The avg network traffic received per second, over the last minrate(node_network_receive_bytes_total[1m])
topk queryLink: query-topk.txt
join
cut
slice
count
predict
sum
min
max
avg

1.7 Prometheus Alerts

NameCommand
How full will the disks be in 4 hours?
Which services are the top 5 users of CPU?
What’s the 95th percentile latency in EU datacenter?

1.8 More Resources

License: Code is licensed under MIT License.

https://prometheus.io/

https://povilasv.me/prometheus-tracking-request-duration/