/lxd-telegraf-stats

LXD containers metrics

Primary LanguagePython

Python script for gathering metrics of containers on lxd host.

Script is periodically triggered by telegraf and gathered metrics are sent to influxdb (and/or another metricsdb, based on telegraf configuration).

grafana_dashboards contains example dashbords which using gathered metrics.

Tested on Ubuntu LTS 16.04 with LXD version 2.20-0ubuntu4~16.04.1~ppa1 (python 2.7, kernel 4.10.0-40-generic) Newly created python3 branch have python3 version of script which was tested on Ubuntu LTS 20.04 with snap LXD version 4.0.3 (python 3.8.2, kernel 5.4.0-47-generic)

Howto make it work (ubuntu server, for other distro use appropriate tools):

  1. install lxd server https://linuxcontainers.org/lxd/introduction/ (i have spotted that this script not working with snap version of LXD - use PPA or Backports)
  2. install telegraf https://docs.influxdata.com/telegraf/v1.4/introduction/installation/
  3. install influxdb https://docs.influxdata.com/influxdb/v1.3/introduction/installation/
  4. install grafana http://docs.grafana.org/installation/debian/
  5. configure telegraf to use influxdb (configure the running period for telegraf - default is 10s which should be too often (and may broke things))
  6. copy lxd.conf to /etc/telegraf/telegraf.d/
  7. copy sudoers telegraf to /etc/sudoers.d/
  8. copy lxd-telegraf-stats.py to /usr/local/sbin/ and chmod +x it.
  9. install additional python modules: apt-get install python-ws4py python-pylxd or via pip(3) pip(3) install ws4py pylxd
  10. try to run script - output should be like:
/usr/local/sbin/lxd-telegraf-stats.py
lxd,type=container,hostname=master-666,name=master,instance=666,metric=status running=1,processes=27,cpuprio=1024,hddprio=500
lxd,type=container,hostname=master-666,name=master,instance=666,metric=mem usage=64884736,usage_pct=3,limit=2147483648,peak=67051520
lxd,type=container,hostname=master-666,name=master,instance=666,metric=swap usage=0,peak=0
lxd,type=container,hostname=master-666,name=master,instance=666,metric=cpu usage=552144742062,limit=2,usage_percpu=276072371031
lxd,type=container,hostname=master-666,name=master,instance=666,metric=blkio bytes_total=31486976,iops_total=779,bytes_write=0,iops_write=0,bytes_read=31486976,iops_read=779
lxd,type=container,hostname=master-666,name=master,instance=666,metric=net,dev=lo pkts_out=788470,bytes_in=68317360,bytes_out=68317360,pkts_in=788470
lxd,type=container,hostname=master-666,name=master,instance=666,metric=net,dev=eth1 pkts_out=182,bytes_in=15541517,bytes_out=12836,pkts_in=182031
lxd,type=container,hostname=master-666,name=master,instance=666,metric=net,dev=vxlan pkts_out=2095146,bytes_in=40770984,bytes_out=85283645,pkts_in=575666
lxd,type=container,hostname=master-666,name=master,instance=666,metric=net,dev=eth0 pkts_out=2118158,bytes_in=97459691,bytes_out=220339657,pkts_in=677739
lxd,type=container,hostname=master-666,name=master,instance=666,metric=disk,dev=tmp usage=0,usage_pct=0,limit=536870912
lxd,type=container,hostname=master-666,name=master,instance=666,metric=disk,dev=root usage=138584064,usage_pct=1,limit=10737418240
lxd,type=container,hostname=logger-666,name=logger,instance=666,metric=status running=1,processes=16,cpuprio=1024,hddprio=500
lxd,type=container,hostname=logger-666,name=logger,instance=666,metric=mem usage=47546368,usage_pct=2,limit=2147483648,peak=50176000
lxd,type=container,hostname=logger-666,name=logger,instance=666,metric=swap usage=0,peak=0
lxd,type=container,hostname=logger-666,name=logger,instance=666,metric=cpu usage=307662859002,limit=2,usage_percpu=153831429501
lxd,type=container,hostname=logger-666,name=logger,instance=666,metric=blkio bytes_total=29099008,iops_total=840,bytes_write=0,iops_write=0,bytes_read=29099008,iops_read=840
lxd,type=container,hostname=logger-666,name=logger,instance=666,metric=net,dev=lo pkts_out=0,bytes_in=0,bytes_out=0,pkts_in=0
lxd,type=container,hostname=logger-666,name=logger,instance=666,metric=net,dev=vxlan pkts_out=10998,bytes_in=32260729,bytes_out=308352,pkts_in=1084500
lxd,type=container,hostname=logger-666,name=logger,instance=666,metric=net,dev=eth0 pkts_out=32390,bytes_in=121446118,bytes_out=1910852,pkts_in=1184950
lxd,type=container,hostname=logger-666,name=logger,instance=666,metric=disk,dev=tmp usage=0,usage_pct=0,limit=536870912
lxd,type=container,hostname=logger-666,name=logger,instance=666,metric=disk,dev=root usage=18042880,usage_pct=0,limit=10737418240
lxd,type=master,metric=mem total=16812589056,given=25402523648,used=691441664,given_pct=151,used_pct=4
lxd,type=master,metric=other live=1
lxd,type=master,metric=hdd total=1990116046274,given=2033065719234,used=316979200,given_pct=102,used_pct=0
lxd,type=master,metric=cpu given=24,total=16
lxd,type=master,metric=containers running=5,total=6,stopped=1,notrunning=1
  1. restart telegraf: systemctl restart telegraf.service (you should test the gathering with telegraf --test command)
  2. login to your grafana and import attached dashboards from grafana_dashboards
  3. edit/tweak settings to make it work
  4. do not let your eyeballs pop out!
  5. maybe i'm missed something, so look at your logs if something went wrong.

ScreenShot:

alt text