linux-system-roles/metrics

[PCP] Add collection from openmetrics end point

sradco opened this issue · 5 comments

We need the PCP role to be able to collect metrics from a Prometheus endpoint or a Collectd with write_prometheus output plugin.

@lberk

I guess we will need to install pcp-pmda-prometheus. Or is this exposing PCP metrics for a Prometheus server?

I see it pulls the following packages in fedora.
python3-chardet
python3-idna
python3-pcp
python3-pysocks
python3-requests
python3-urllib3

Do I need to add them to the list in https://bugzilla.redhat.com/show_bug.cgi?id=1627753 or they are alreadt availale for RHEL 7?

Can you help to configure it?

lberk commented

Hi Shirly, those should all already be in el7 (pcp-pmda-elasticsearch ships there as well already)
Configuration for the prometheus pmda is rather simple. Assuming -- as you mentioned to me earlier -- the write_prometheus plugin is writing to port 9103 and it's on the same host:

place a file called 'collectd.url' in /var/lib/pcp/pmdas/prometheus/config.d/ with "http://localhost:9103" in it then enable the pmda by running the 'Install' script as root
concretely:

% cd /var/lib/pcp/pmdas/prometheus/
% echo "http://localhost:9103" > config.d/collectd.url
# ./Install

That's all you need. Our pmda (the 'agent' that can read prometheus formatted metrics) will autodiscover the metrics available.

on my local system with the collectd write_prometheus plugin/service running I can install that config file as I described and run

$ pminfo -f prometheus.collectd
prometheus.collectd.collectd_cpu_total
inst [0 or "0 cpu:0 instance:toium type:idle"] value 263107
inst [1 or "1 cpu:0 instance:toium type:interrupt"] value 362
inst [2 or "2 cpu:0 instance:toium type:nice"] value 26
inst [3 or "3 cpu:0 instance:toium type:softirq"] value 503
inst [4 or "4 cpu:0 instance:toium type:steal"] value 0
inst [5 or "5 cpu:0 instance:toium type:system"] value 10060
inst [6 or "6 cpu:0 instance:toium type:user"] value 17061
inst [7 or "7 cpu:0 instance:toium type:wait"] value 1741
inst [8 or "8 cpu:1 instance:toium type:idle"] value 267345
inst [9 or "9 cpu:1 instance:toium type:interrupt"] value 808
inst [10 or "10 cpu:1 instance:toium type:nice"] value 54
inst [11 or "11 cpu:1 instance:toium type:softirq"] value 329
inst [12 or "12 cpu:1 instance:toium type:steal"] value 0
inst [13 or "13 cpu:1 instance:toium type:system"] value 8217
inst [14 or "14 cpu:1 instance:toium type:user"] value 14847
inst [15 or "15 cpu:1 instance:toium type:wait"] value 1322
inst [16 or "16 cpu:2 instance:toium type:idle"] value 252282
inst [17 or "17 cpu:2 instance:toium type:interrupt"] value 330
inst [18 or "18 cpu:2 instance:toium type:nice"] value 20
inst [19 or "19 cpu:2 instance:toium type:softirq"] value 298
inst [20 or "20 cpu:2 instance:toium type:steal"] value 0
inst [21 or "21 cpu:2 instance:toium type:system"] value 16809
inst [22 or "22 cpu:2 instance:toium type:user"] value 22510
inst [23 or "23 cpu:2 instance:toium type:wait"] value 708
inst [24 or "24 cpu:3 instance:toium type:idle"] value 259763
inst [25 or "25 cpu:3 instance:toium type:interrupt"] value 322
inst [26 or "26 cpu:3 instance:toium type:nice"] value 20
inst [27 or "27 cpu:3 instance:toium type:softirq"] value 277
inst [28 or "28 cpu:3 instance:toium type:steal"] value 0
inst [29 or "29 cpu:3 instance:toium type:system"] value 12477
inst [30 or "30 cpu:3 instance:toium type:user"] value 18598
inst [31 or "31 cpu:3 instance:toium type:wait"] value 1416
inst [32 or "32 cpu:4 instance:toium type:idle"] value 265664
inst [33 or "33 cpu:4 instance:toium type:interrupt"] value 391
inst [34 or "34 cpu:4 instance:toium type:nice"] value 17
inst [35 or "35 cpu:4 instance:toium type:softirq"] value 264
inst [36 or "36 cpu:4 instance:toium type:steal"] value 0
inst [37 or "37 cpu:4 instance:toium type:system"] value 9177
inst [38 or "38 cpu:4 instance:toium type:user"] value 16083
inst [39 or "39 cpu:4 instance:toium type:wait"] value 1223
inst [40 or "40 cpu:5 instance:toium type:idle"] value 268442
inst [41 or "41 cpu:5 instance:toium type:interrupt"] value 307
inst [42 or "42 cpu:5 instance:toium type:nice"] value 53
inst [43 or "43 cpu:5 instance:toium type:softirq"] value 208
inst [44 or "44 cpu:5 instance:toium type:steal"] value 0
inst [45 or "45 cpu:5 instance:toium type:system"] value 7740
inst [46 or "46 cpu:5 instance:toium type:user"] value 15277
inst [47 or "47 cpu:5 instance:toium type:wait"] value 897
inst [48 or "48 cpu:6 instance:toium type:idle"] value 271492
inst [49 or "49 cpu:6 instance:toium type:interrupt"] value 292
inst [50 or "50 cpu:6 instance:toium type:nice"] value 36
inst [51 or "51 cpu:6 instance:toium type:softirq"] value 154
inst [52 or "52 cpu:6 instance:toium type:steal"] value 0
inst [53 or "53 cpu:6 instance:toium type:system"] value 6250
inst [54 or "54 cpu:6 instance:toium type:user"] value 13461
inst [55 or "55 cpu:6 instance:toium type:wait"] value 1190
inst [56 or "56 cpu:7 instance:toium type:idle"] value 266549
inst [57 or "57 cpu:7 instance:toium type:interrupt"] value 368
inst [58 or "58 cpu:7 instance:toium type:nice"] value 61
inst [59 or "59 cpu:7 instance:toium type:softirq"] value 217
inst [60 or "60 cpu:7 instance:toium type:steal"] value 0
inst [61 or "61 cpu:7 instance:toium type:system"] value 8521
inst [62 or "62 cpu:7 instance:toium type:user"] value 15800
inst [63 or "63 cpu:7 instance:toium type:wait"] value 1370

prometheus.collectd.collectd_memory
inst [0 or "0 instance:toium memory:buffered"] value 2445312
inst [1 or "1 instance:toium memory:cached"] value 6148284416
inst [2 or "2 instance:toium memory:free"] value 7208927232
inst [3 or "3 instance:toium memory:slab_recl"] value 284467200
inst [4 or "4 instance:toium memory:slab_unrecl"] value 209063936
inst [5 or "5 instance:toium memory:used"] value 2597339136

prometheus.collectd.collectd_load_shortterm
inst [0 or "0 instance:toium"] value 2.74

prometheus.collectd.collectd_load_midterm
inst [0 or "0 instance:toium"] value 3.28

prometheus.collectd.collectd_load_longterm
inst [0 or "0 instance:toium"] value 2.16

prometheus.collectd.collectd_interface_if_packets_tx_total
inst [0 or "0 instance:toium interface:enp0s31f6"] value 0
inst [1 or "1 instance:toium interface:lo"] value 1552
inst [2 or "2 instance:toium interface:tun0"] value 1879
inst [3 or "3 instance:toium interface:virbr0"] value 0
inst [4 or "4 instance:toium interface:virbr0-nic"] value 0
inst [5 or "5 instance:toium interface:wlp61s0"] value 14003

prometheus.collectd.collectd_interface_if_packets_rx_total
inst [0 or "0 instance:toium interface:enp0s31f6"] value 0
inst [1 or "1 instance:toium interface:lo"] value 1552
inst [2 or "2 instance:toium interface:tun0"] value 1362
inst [3 or "3 instance:toium interface:virbr0"] value 0
inst [4 or "4 instance:toium interface:virbr0-nic"] value 0
inst [5 or "5 instance:toium interface:wlp61s0"] value 36093

prometheus.collectd.collectd_interface_if_octets_tx_total
inst [0 or "0 instance:toium interface:enp0s31f6"] value 0
inst [1 or "1 instance:toium interface:lo"] value 4034597
inst [2 or "2 instance:toium interface:tun0"] value 156516
inst [3 or "3 instance:toium interface:virbr0"] value 0
inst [4 or "4 instance:toium interface:virbr0-nic"] value 0
inst [5 or "5 instance:toium interface:wlp61s0"] value 2764311

prometheus.collectd.collectd_interface_if_octets_rx_total
inst [0 or "0 instance:toium interface:enp0s31f6"] value 0
inst [1 or "1 instance:toium interface:lo"] value 4034597
inst [2 or "2 instance:toium interface:tun0"] value 252707
inst [3 or "3 instance:toium interface:virbr0"] value 0
inst [4 or "4 instance:toium interface:virbr0-nic"] value 0
inst [5 or "5 instance:toium interface:wlp61s0"] value 33517212

prometheus.collectd.collectd_interface_if_errors_tx_total
inst [0 or "0 instance:toium interface:enp0s31f6"] value 0
inst [1 or "1 instance:toium interface:lo"] value 0
inst [2 or "2 instance:toium interface:tun0"] value 0
inst [3 or "3 instance:toium interface:virbr0"] value 0
inst [4 or "4 instance:toium interface:virbr0-nic"] value 0
inst [5 or "5 instance:toium interface:wlp61s0"] value 0

prometheus.collectd.collectd_interface_if_errors_rx_total
inst [0 or "0 instance:toium interface:enp0s31f6"] value 0
inst [1 or "1 instance:toium interface:lo"] value 0
inst [2 or "2 instance:toium interface:tun0"] value 0
inst [3 or "3 instance:toium interface:virbr0"] value 0
inst [4 or "4 instance:toium interface:virbr0-nic"] value 0
inst [5 or "5 instance:toium interface:wlp61s0"] value 0

prometheus.collectd.collectd_interface_if_dropped_tx_total
inst [0 or "0 instance:toium interface:enp0s31f6"] value 0
inst [1 or "1 instance:toium interface:lo"] value 0
inst [2 or "2 instance:toium interface:tun0"] value 0
inst [3 or "3 instance:toium interface:virbr0"] value 0
inst [4 or "4 instance:toium interface:virbr0-nic"] value 0
inst [5 or "5 instance:toium interface:wlp61s0"] value 0

prometheus.collectd.collectd_interface_if_dropped_rx_total
inst [0 or "0 instance:toium interface:enp0s31f6"] value 0
inst [1 or "1 instance:toium interface:lo"] value 0
inst [2 or "2 instance:toium interface:tun0"] value 0
inst [3 or "3 instance:toium interface:virbr0"] value 0
inst [4 or "4 instance:toium interface:virbr0-nic"] value 0
inst [5 or "5 instance:toium interface:wlp61s0"] value 0

Thank you.

What permissions should we give the configuration files?
owner, group and mode ?

@lberk

lberk commented

Assuming a similar file permissions structure to one on fedora/rhel:

% ls -l /var/lib/pcp/pmdas/prometheus/config.d 
total 4
-rw-r--r--. 1 root root 22 Nov 22 20:07 collectd.url

644 should suffice.

Just a general note for when we come back to this, the PCP agent for this is now pcp-pmda-openmetrics.