采集插件的设计
UlricQin opened this issue · 0 comments
前言
cprobe 需要采用插件化机制,来集成众多的采集能力,比如把 mysqld_exporter、redis_exporter、categraf http_response 等都集成到 cprobe 中。对于每个具体的插件而言,核心包括:
- 要采集的 target 的列表获取。学习 Prometheus 的 scrape,初期支持 static_configs、http_sd_configs、file_sd_configs
- 采集时使用的配置参数,配置文件要能够切文件管理,这样不同的 target 可以使用不同的采集规则
组织形式
所有 cprobe 的配置内容都放到 conf.d
目录下,conf.d
的每个子目录就表示一个插件,比如 conf.d
下有一个 mysql 目录,放置 mysql 采集插件的相关配置,有一个 redis 目录,放置 redis 采集插件的相关配置。
对于某一个插件目录,比如 mysql 目录下面,是一个 main.yaml 作为入口配置文件。当然,也可以有多个入口文件,cprobe 使用 main*.yaml
做 glob 通配,匹配到几个 yaml 文件,就有几个入口配置文件。入口配置文件中要配置要采集的目标,要使用的采集规则的文件。
配置举例
假设 mysql 插件目录下有个 main.yaml,其配置大概会长这个样子:
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'mysql_prod'
scrape_interval: 5s
scrape_rule_files:
- 'rules.d/common.toml'
- 'rules.d/schema.toml'
static_configs:
- targets:
- 'a.com:3306'
- 'b.com:3306'
labels:
name: 'ulricqin'
city: 'beijing'
- targets:
- 'c.com:3306'
- 'd.com:3306'
labels:
name: 'ulricqin2'
city: 'beijing2'
lang: '%{LANG}'
- job_name: 'mysql_test'
http_sd_configs:
- url: http://localhost:8080/get-targets
scrape_rule_files:
- 'rules.d/common.toml'
- job_name: 'mysql_abcd'
file_sd_configs:
- files:
- 'inst.yaml'
scrape_rule_files:
- 'rule_head.toml'
- 'rule_coll.toml'
- 'rule_cust.toml'
上面的配置和 Prometheus 的 scrape 配置几乎一样,多的部分是 scrape_rule_files,对于每个采集 job 而言,通过各类 sd 配置可以拿到要采集的目标 target,但是具体如何采集,就是需要靠这些 scrape_rule_files 来指定了。
配置文件应该支持环境变量,使用 %{ENV_VAR}
的格式来表示,例如上例中的 %{LANG}
就是引用了环境变量中的 LANG 变量的值。
对于 file_sd_configs,初期只支持引用 yaml 文件,应该够用了。
scrape_rules_files 是引用了一堆 toml 文件,cprobe 具体处理的时候就是依次读取这几个 toml 文件的内容,然后拼成一个大的配置文件。只有 toml 格式才适合做这种拼接,yaml 和 json 都不行,所以 rule 的配置使用 toml 格式。
rule_head.toml 举例:
[global]
user = 'root'
password = 'cProbePa55'
# ssl_ca = '/etc/mysql/ssl/ca.pem'
# ssl_cert = '/etc/mysql/ssl/client-cert.pem'
# ssl_key = '/etc/mysql/ssl/client-key.pem'
# ssl_skip_verfication = true
# tls = 'skip-verify'
一般每个插件都应该要配置认证信息,对于 mysql 而言,就是 user 和 password,当然了,如果启用了 ssl 还需要有证书相关的配置,上面的配置是尽可能和 mysqld_exporter 保持一致,便于大家理解。
rule_coll.toml 举例:
[collect_global_status]
enabled = true
[collect_global_variables]
enabled = true
[collect_slave_status]
enabled = true
[collect_info_schema_innodb_cmp]
enabled = true
[collect_info_schema_innodb_cmpmem]
enabled = true
[collect_info_schema_query_response_time]
enabled = true
[collect_info_schema_processlist]
enabled = true
# Minimum time a thread must be in each state to be counted
min_time = 0
# Enable collecting the number of processes by user
processes_by_user = true
# Enable collecting the number of processes by host
processes_by_host = true
[collect_info_schema_tables]
enabled = false
# The list of databases to collect table stats for, or '*' for all
databases = "*"
[collect_info_schema_innodb_tablespaces]
enabled = false
[collect_info_schema_innodb_metrics]
enabled = false
[collect_info_schema_userstats]
enabled = false
[collect_info_schema_clientstats]
enabled = false
[collect_info_schema_tablestats]
enabled = false
[collect_info_schema_schemastats]
enabled = false
[collect_info_schema_replica_host]
enabled = false
[collect_mysql_user]
enabled = false
# Enable collecting user privileges from mysql.user
collect_user_privileges = false
[collect_auto_increment_columns]
enabled = false
[collect_binlog_size]
enabled = false
[collect_perf_schema_tableiowaits]
enabled = false
[collect_perf_schema_indexiowaits]
enabled = false
[collect_perf_schema_tablelocks]
enabled = false
[collect_perf_schema_eventsstatements]
enabled = false
# Limit the number of events statements digests by response time
limit = 250
# Limit how old the 'last_seen' events statements can be, in seconds
timelimit = 86400
# Maximum length of the normalized statement text
digest_text_limit = 120
[collect_perf_schema_eventsstatementssum]
enabled = false
[collect_perf_schema_eventswaits]
enabled = false
[collect_perf_schema_file_events]
enabled = false
[collect_perf_schema_file_instances]
enabled = false
# RegEx file_name filter for performance_schema.file_summary_by_instance
filter = ".*"
# Remove path prefix in performance_schema.file_summary_by_instance
remove_prefix = "/var/lib/mysql/"
[collect_perf_schema_memory_events]
enabled = false
# Remove instrument prefix in performance_schema.memory_summary_global_by_event_name
remove_prefix = "memory/"
[collect_perf_schema_replication_group_members]
enabled = false
[collect_perf_schema_replication_group_member_stats]
enabled = false
[collect_perf_schema_replication_applier_status_by_worker]
enabled = false
[collect_sys_user_summary]
enabled = false
[collect_engine_tokudb_status]
enabled = false
[collect_engine_innodb_status]
enabled = false
[collect_heartbeat]
enabled = false
# Database from where to collect heartbeat data
database = "heartbeat"
# Table from where to collect heartbeat data
table = "heartbeat"
# Use UTC for timestamps of the current server
utc = true
[collect_slave_hosts]
enabled = false
这一堆 collect 相关的配置,其实是 mysqld_exporter 的所有命令行参数,在 cprobe 的体系里,需要改成配置文件。
rule_cust.toml 举例:
[[queries]]
mesurement = "biz_users"
metric_fields = [ "total" ]
label_fields = [ "service" ]
field_to_append = "x"
timeout = "3s"
request = '''
select 'n9e' as service, 'test' as x, count(*) as total from n9e_v6.users
'''
这是 cprobe 对 mysql 采集的扩展能力。允许用户自定义采集的 SQL。
rule_head.toml、rule_coll.toml、rule_cust.toml 三个配置文件拼成了最终的配置,你可以对这三个文件进行重新组织,甚至重新切分,不同的 scrape job 可以采用不同的 rule 文件的组合,非常灵活。