/host_monitor

This repository is a simple telegraf/influxdb/grafana monitoring utility.

Primary LanguageDockerfileApache License 2.0Apache-2.0

System Monitoring Tools

WARNING: This is not tested with a computer that does not have a GPU. telegraf may fail to launch due to the configuration.

This project provides a monitoring utility for saving host information.
Not all the features are always available on the devices, so you can customize the stack accordingly.
The intention is this runs on a host completely self contained.

Grafana/InfluxDB and Telegraf are fully encapsulated

You must setup nvidia runtime environment or disable configs/telegraf.conf the nvidia-smi plugin.

InfluxDB uses the collectd typesdb file in order to know how to read the data and logs it into the telegraf inside of influxdb

Grafana provides visualization tools for monitoring data

Table of Contents

Prerequisites

  • Linux: Tested on Ubuntu 22.04
  • docker/docker-compose setup and installed for your distribution

Setup Nvidia Runtime Environment Install the nvidia-container-runtime Centos

sudo yum install nvidia-container-runtime

Ubuntu

sudo apt install nvidia-container-runtime

Running

Starting

docker-compose up -d

Stopping

docker-compose down

Viewing in Grafana, replace localhost with whatever the machine name is or IP. Default password is admin\admin then you can change it or keep it the same on your installation

http://localhost:8086

Debugging

Checking the logs

docker container logs grafana-sysstats -f
docker container logs influxdb-sysstats -f
docker container logs telegraf-gpu-sysstats -f

If you want to "start over":

sudo rm -rf /srv/docker/grafana
sudo rm -rf /srv/docker/influxdb

Dashboards

image

image