/gdg

Granular Data Gatherer is an easy and open all-in-one tool to collect OS metrics for troubleshooting

Primary LanguageGoGNU General Public License v3.0GPL-3.0

Contributors Language Issues GPL-3.0 License Watchers


Granular Data Gatherer (gdg)

Collects Granular OS Metrics for Troubleshooting
Report Bug · Request Feature

Table of Contents

  1. About The Project
  2. Getting Started
  3. Technical Details
  4. Usage
  5. Build It Yourself
  6. Validated Distributions
  7. Roadmap
  8. Contributing
  9. License
  10. Reference

About The Project

gdg or Granular Data Gatherer was developed in Go to fill the missing gap in the availability of an easy and open all-in-one tool to collect OS metrics for troubleshooting. OSWatcher and nmon cannot be the only viable options.

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

  • a server, instance, VM running a systemd-enabled Linux distribution

Installation

Download the binary from Releases (https://github.com/rfparedes/gdg/releases/latest/download/gdg) to /usr/local/sbin on the server and run:

sudo chmod +x /usr/local/sbin/gdg

Start it

sudo /usr/local/sbin/gdg --start

Check Status Anytime

/usr/local/sbin/gdg --status

Technical Details

  • There are three components to gdg, each which can be separately started or stopped

    1. granular data collection using standard utilities
    2. rtmon collection of network state information
    3. process d-state detect and automated sysrq-t
  • gdg uses standard Linux utilities to perform its work, including:

    • iostat
    • top
    • mpstat
    • vmstat
    • ss
    • nstat
    • ps
    • nfsiostat
    • ethtool
    • ip
    • pidstat
    • numastat
    • sar
    • rtmon
  • gdg will detect which utilities are available and only use those installed. In advance, you can install any of the utilities above anytime before or after setup. Most of these utilities are located in six different packages. On most distributions, sysstat package contains (iostat, mpstat, pidstat, sar), nfs-common or nfs-client package contains (nfsiostat), procps package contains (top, vmstat, ps), iproute2 package contains (ss, nstat, ip, rtmon), ethtool contains (ethtool), and numactl contains (numastat).

  • gdg will by default keep seven days of logs. This can be changed by the user with the --logdays option. In addition, all log files that haven't been gzipped, with the exception of the log currently being written to, will be gzipped hourly. gdg --status will give you the current gdg space usage

  • gdg will create a configuration file in /etc/gdg.cfg and a data directory in /var/log/gdg-data.

  • gdg uses a systemd timer so there is no running daemon.

  • gdg installs two systemd services and two systemd timer on --start. One set of service and timer files are responsible for calling the data collection. The other set of service and timer files are responsible for the log tidying every hour.

  • gdg removes the systemd service and systemd timer on --stop. All other files are untouched.

  • gdg collects data in the /var/log/gdg-data directory. The children below this directory are named after the utility (e.g. iostat) which collected the data. Below this directory are .dat (e.g. meminfo_21.03.07.2300.dat) files named after the following format (utility_YY.MM.DD.HH00.dat). The .dat files contain at maximum, one hour worth of data.

  • To easily search down chronologically through the data collected in the .dat file, use the search string zzz.

  • rtmon logging needs to be enabled explicitly and will collect network state information directly from the kernel on an ongoing basis. Enabling this enables a systemd service which is running while rtmon is enabled. This can be used to prove that service issues started after an external network failure. [1]

  • If d-state is enabled, during each interval run, the number of processes in D state are detected and if this number is greater than or equal to a user-defined value (number of processes in D state), echo t > /proc/sysrq-trigger is executed to get a task trace of all processes. This is a one-time action, meaning, once task trace is triggered, it won't be triggered again until user enables again explictly.

Usage

To start collection in 30s intervals and keep logs for 7 days, run

sudo /usr/local/sbin/gdg --interval 30 --logdays 7 --start

To stop collection, run

sudo /usr/local/sbin/gdg --stop

To see the data collected

cd /var/log/gdg-data

To see the current status of gdg including start/stop status, version, interval, data location, and current size of collected data, run

/usr/local/sbin/gdg --status

e.g.

~~~~~~~~~~~~~~~
  gdg status
~~~~~~~~~~~~~~~
VERSION: gdg-0.9.1
STATUS: started
RTMON: started
INTERVAL: 15s
LOG DAYS TO KEEP: 14d
DATA LOCATION: /var/log/gdg-data/
CONFIG LOCATION: /etc/gdg.cfg
CURRENT DATA SIZE: 318MB
~~~~~~~~~~~~~~~
DSTATE: stopped
NUMPROCS: 0

If you want to change the interval (-t) or logdays (-l) after installing additional supported utilities, run

sudo /usr/local/sbin/gdg --reload --interval 60 --logdays 14

To toggle rtmon logging on or off, run

sudo /usr/local/sbin/gdg --rtmon

To enable d-state functionality to trigger sysrq-t

sudo /usr/local/sbin/gdg --dst <NUMPROCS>

For help

/usr/local/sbin/gdg --help

Build it yourself

  • You'll need a go compiler installed

Clone it

git clone https://github.com/rfparedes/gdg.git

Build it

cd gdg
go build -o gdg

Move it

mv gdg /usr/local/sbin
sudo chmod +x /usr/local/sbin/gdg

Start it

sudo /usr/local/sbin/gdg --start

Validated Distributions

gdg has been validated on:

  • SLE-12 (SLES or SLES-SAP 12 all SPs)
  • SLE-15 (SLES or SLES-SAP 15 all SPs)
  • openSUSE Leap 12/15
  • Debian 9
  • Debian 10
  • RHEL7
  • RHEL8
  • Ubuntu 18.04
  • Ubuntu 20.04

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the GPL-3.0 License. See LICENSE for more information.

Reference

[1] https://www.suse.com/support/kb/doc/?id=000019863