heidsoft/cloud-bigdata-book

Prometheus+Grafana

Opened this issue · 3 comments

使用

How to Write Rules for Prometheus
How To Monitor Linux Servers Using Prometheus Node Exporter
Monitoring your Linux Servers with Prometheus and Grafana in 7 Minutes
How to Monitor Linux Server Performance with Prometheus and Grafana in 5 minutes
Install Prometheus Server on CentOS 7 and Ubuntu 18.04
使用 promethues 和 grafana 监控自己的 linux 机器

参考链接

https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/alert/prometheus-alert-rule
https://awesome-prometheus-alerts.grep.to/rules.html
https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
https://alex.dzyoba.com/blog/prometheus-alerts/
https://gist.github.com/devops-school/98d7eed1a9df6c372c45452730791f7a
https://www.metricfire.com/blog/top-5-prometheus-alertmanager-gotchas/
https://www.weave.works/blog/labels-in-prometheus-alerts-think-twice-before-using-them
https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/alert/prometheus-alert-rule
https://softwareadept.xyz/2018/01/how-to-write-rules-for-prometheus/
https://blog.networktocode.com/post/prometheus_alerting/
https://www.devopsschool.com/blog/recording-rules-and-alerting-rules-exmplained-in-prometheus/
https://blog.csdn.net/shida_csdn/article/details/81980021
https://gitlab.cern.ch/paas-tools/monitoring/prometheus-webhook-receiver/-/tree/master
https://superuser.com/questions/443406/how-can-i-produce-high-cpu-load-on-a-linux-server
https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/alert/alert-manager-config
https://www.jianshu.com/p/fd0b018539cd
https://help.aliyun.com/document_detail/123117.html?utm_content=g_1000230851&spm=5176.20966629.toubu.3.f2991ddcpxxvD1#h2-alertmanagers88
https://github.com/prometheus/alertmanager/blob/master/api/v2/openapi.yaml
https://github.com/gin-gonic/gin#quick-start
https://www.programmersought.com/article/50413971111/
https://songjiayang.gitbooks.io/prometheus/content/configuration/rule_files.html

刷新reload配置

[root@localhost prometheus-2.19.1.linux-amd64]# curl -v -X POST http://172.16.59.100:9090/-/reload

  • About to connect() to 172.16.59.100 port 9090 (#0)
  • Trying 172.16.59.100...
  • Connected to 172.16.59.100 (172.16.59.100) port 9090 (#0)

POST /-/reload HTTP/1.1
User-Agent: curl/7.29.0
Host: 172.16.59.100:9090
Accept: /

< HTTP/1.1 200 OK
< Date: Thu, 25 Mar 2021 12:43:34 GMT
< Content-Length: 0
<

  • Connection #0 to host 172.16.59.100 left intact
    [root@localhost prometheus-2.19.1.linux-amd64]#
    100 - (avg by(instance) (rate(node_cpu_seconds_total[2m])) * 100) > 80

prometheus的relabel_configs的理解

prometheus的relabel_configs的理解
Kubernetes下的服务发现
Prometheus的服务发现机制

默认情况下,当Prometheus加载Target实例完成后,这些Target时候都会包含一些默认的标签:

 上面这些标签将会告诉Prometheus如何从该Target实例中获取监控数据。一般来说,Target以__作为前置的标签是在系统内部使用的,因此这些标签不会被写入到样本数据中。不过这里有一些例外,例如,我们会发现所有通过Prometheus采集的样本数据中都会包含一个名为instance的标签,该标签的内容对应到Target实例的__address__。 这里实际上是发生了一次标签的重写处理。

这种发生在采集样本数据之前,对Target实例的标签进行重写的机制在Prometheus被称为Relabeling。

 

                                                                    Relabeling作用时机

Prometheus允许用户在采集任务设置中通过relabel_configs来添加自定义的Relabeling过程。

replace/labelmap/labelkeep/labeldrop对标签进行管理
完整的relabel_config配置如下所示:

__address__:当前Target实例的访问地址<host>:<port>

__scheme__:采集目标服务访问地址的HTTP Scheme,HTTP或者HTTPS

__metrics_path__:采集目标服务访问地址的访问路径

__param_<name>:采集任务目标服务的中包含的请求参数

# The source labels select values from existing labels. Their content is concatenated
# using the configured separator and matched against the configured regular expression
# for the replace, keep, and drop actions.
[ source_labels: '[' <labelname> [, ...] ']' ]
 
# Separator placed between concatenated source label values.
[ separator: <string> | default = ; ]
 
# Label to which the resulting value is written in a replace action.
# It is mandatory for replace actions. Regex capture groups are available.
[ target_label: <labelname> ]
 
# Regular expression against which the extracted value is matched.
[ regex: <regex> | default = (.*) ]
 
# Modulus to take of the hash of the source label values.
[ modulus: <uint64> ]
 
# Replacement value against which a regex replace is performed if the
# regular expression matches. Regex capture groups are available.
[ replacement: <string> | default = $1 ]
 
# Action to perform based on regex matching.
[ action: <relabel_action> | default = replace ]
 其中action定义了当前relabel_config对Metadata标签的处理方式,默认的action行为为replace。

replace是根据regex的配置匹配source_labels标签的值(多个source_label的值会按照separator进行拼接),并且将匹配到的值写入到target_label当中,如果有多个匹配组,则可以使用${1}, ${2}确定写入的内容。如果没匹配到任何内容则不对target_label进行重新。如:

  - job_name: 'kubernetes-kubelet'
 
      scheme: https
 
      tls_config:
 
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
 
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
 
      kubernetes_sd_configs:
 
      - role: node
 
      relabel_configs:
 
      - target_label: __address__
 
        replacement: kubernetes.default.svc:443
 
      - source_labels: [__meta_kubernetes_node_name]
 
        regex: (.+)
 
        target_label: __metrics_path__
 
        replacement: /api/v1/nodes/${1}/proxy/metrics
目标标签__metrics_path_的值为/api/v1/nodes/${1}/proxy/metrics。 其中${1}是正则表达式(.+)从__meta_kubernetes_node_name的值中捕获的内容。

而labelmap会根据regex去匹配Target实例所有标签的名称(注意是名称),并且将捕获到的内容作为为新的标签名称,regex匹配到标签的的值作为新标签的值。如:

- job_name: 'kubernetes-nodes'
 
  kubernetes_sd_configs:
 
  - role: node
 
  relabel_configs:
 
  - action: labelmap
 
    regex: __meta_kubernetes_node_label_(.+)
原标签为: __meta_kubernetes_node_label_test=tttt

则目标标签为: test=tttt

使用labelkeep或者labeldrop则可以对Target标签进行过滤,仅保留符合过滤条件的标签,例如:

relabel_configs:
  - regex: label_should_drop_(.+)
    action: labeldrop
该配置会使用regex匹配当前Target实例的所有标签,并将符合regex规则的标签从Target实例中移除。labelkeep正好相反,会移除那些不匹配regex定义的所有标签。

使用keep/drop过滤Target实例
 

scrape_configs:
  - job_name: node_exporter
    consul_sd_configs:
      - server: localhost:8500
        services:
          - node_exporter
    relabel_configs:
    - source_labels:  ["__meta_consul_dc"]
      regex: "dc1"
      action: keep
上述配置表示只要指标的“__meta_consul_dc”这个标签的值含有“dc1”,就保留这个指标。

当action设置为keep时,Prometheus会丢弃source_labels的值中没有匹配到regex正则表达式内容的Target实例,而当action设置为drop时,则会丢弃那些source_labels的值匹配到regex正则表达式内容的Target实例。