Build monitor for Lab

Use

  • Grafana
  • prometheus
  • node-exporter
  • nvidia_exporter
  • apcupsd
  • snmp-exporter
  • Use Granfana Alerting on Telegram
  • blackbox-exporter
  • kubernetes-node (k8s)
  • kubernetes-metrics (k8s)
  • kubernetes-services (k8s)
  • nut-ups

work with

to monitor

  • server
  • traefik
  • gpu
  • apc(ups)
  • nas
  • switch
  • router
  • hp printer
  • ping, dns

Download this project

git clone https://github.com/NTU-ToolmenLab/LabServer_monitor
cd LabServer_monitor

Run with docker-compose

dockercompose is not maintained now.

Run with kubernetes

1. Setting

The secret data, e.g. ip, path, oauth data will store at here

Rename and change the data in config.example.yaml to config.yaml

2. Setup

./setup.sh will

  • Build the dockerfile
  • Push the file to local registry

3. Run

./run.sh

Note I use j2 to render the template.

Helm

Using helm(A k8s package manager) to install promethus and grafana(Included in run.sh).

nvidia_exporter apcupsd_exporter are deployed by Daemon Set across all nodes.

Some setup hint

node-exporter

node-exporter is running on host. What you need to attention are:

  • Firewall should not block port 9100.
  • Use crontab to run it when server is on. Crontab setting: @reboot /opt/node_exporter

UPS setup

Our UPS in our lab is APC, so I use apcupsd on host to read the data sent via usb from APC.

  • NETIP 0.0.0.0 set in /etc/apcupsd/apcupsd.conf that allow for apcupsd_exporter(Inside docker) to access.
  • Do not blocked port 3551.

Monitor Synology NAS

Set up SNMP on NAS https://www.synology.com/en-uk/knowledgebase/DSM/help/DSM/AdminCenter/system_snmp.

Grafana

Set the configuration in Grafana web:

Reference

Grafana Oauth Login

The setting that added in oauth server:

client_id = ""
client_secret = ""
client_name = "grafana"
client_uri = "{{url}}/monitor/"
grant_types = ["authorization_code"]
redirect_uris = ["{{url}}/monitor/login/generic_oauth"]
response_types = ["code"]
scope = "profile"
token_endpoint_auth_method = "client_secret_basic"

Reference

Monitor router

Turn on SNMP for ASUS router, you can follow http://jamyy.us.to/blog/2014/11/6863.html

If you encounter erro, try: ipkg install openssl -force-reinstall

Reference

Monitor traefik

Follow https://docs.traefik.io/configuration/metrics/.

I create my own board board/traefik.json.

Reference:

HP printer

Using custom snmp matrics to get the data from printer.

The custom matrics are generated by snmp generator purposed in https://github.com/prometheus/snmp_exporter/tree/master/generator.

Download mibs

Go to https://spp.itcs.hp.com/spp://spp.itcs.hp.com/spp/

Download it's mibs by SDC > public > LaserJet and Digital Sender > Printer Management > MIBS > Phoenix Device MIBs > lj425,

Get some dependency IF-MIB RFC1155-SMI.txt RFC1158-MIB RFC-1212-MIB.txt RFC1213-MIB.txt SNMPv2-SMI SNMPv2-TC

Put them all into mibs.

Generate

Execute docker run -it --rm -v $PWD/mibs:/opt/ prom/snmp-generator

Manually remove scan_calibration_download and device_redial in snmp.yml(output yaml file).

Test it docker run -it --rm -p 9116:9116 -v $PWD/snmp.yml:/etc/snmp_exporter/snmp.yml prom/snmp-exporter

Bonus

Share the ups status to synology NAS.

Modify ups/ups.conf and k8s/nut.yml to set your server ip where ups usb connected to.

Run kubectl create -f k8s/nut.yml.

LICENSE

MIT