Collects Granular OS Metrics for Troubleshooting
Report Bug
·
Request Feature
gdg or Granular Data Gatherer was developed in Go to fill the missing gap in the availability of an easy and open all-in-one tool to collect OS metrics for troubleshooting. OSWatcher and nmon cannot be the only viable options.
To get a local copy up and running follow these simple steps.
- a server, instance, VM running a systemd-enabled Linux distribution
Download the binary from Releases (https://github.com/rfparedes/gdg/releases/latest/download/gdg) to /usr/local/sbin on the server and run:
sudo chmod +x /usr/local/sbin/gdgStart it
sudo /usr/local/sbin/gdg --startCheck Status Anytime
/usr/local/sbin/gdg --status-
There are three components to gdg, each which can be separately started or stopped
- granular data collection using standard utilities
- rtmon collection of network state information
- process d-state detect and automated sysrq-t
-
gdg uses standard Linux utilities to perform its work, including:
- iostat
- top
- mpstat
- vmstat
- ss
- nstat
- ps
- nfsiostat
- ethtool
- ip
- pidstat
- numastat
- sar
- rtmon
-
gdg will detect which utilities are available and only use those installed. In advance, you can install any of the utilities above anytime before or after setup. Most of these utilities are located in six different packages. On most distributions, sysstat package contains (
iostat,mpstat,pidstat,sar), nfs-common or nfs-client package contains (nfsiostat), procps package contains (top,vmstat,ps), iproute2 package contains (ss,nstat,ip,rtmon), ethtool contains (ethtool), and numactl contains (numastat). -
gdg will by default keep seven days of logs. This can be changed by the user with the --logdays option. In addition, all log files that haven't been gzipped, with the exception of the log currently being written to, will be gzipped hourly.
gdg --statuswill give you the current gdg space usage -
gdg will create a configuration file in
/etc/gdg.cfgand a data directory in/var/log/gdg-data. -
gdg uses a systemd timer so there is no running daemon.
-
gdg installs two systemd services and two systemd timer on
--start. One set of service and timer files are responsible for calling the data collection. The other set of service and timer files are responsible for the log tidying every hour. -
gdg removes the systemd service and systemd timer on
--stop. All other files are untouched. -
gdg collects data in the
/var/log/gdg-datadirectory. The children below this directory are named after the utility (e.g.iostat) which collected the data. Below this directory are .dat (e.g.meminfo_21.03.07.2300.dat) files named after the following format (utility_YY.MM.DD.HH00.dat). The .dat files contain at maximum, one hour worth of data. -
To easily search down chronologically through the data collected in the .dat file, use the search string
zzz. -
rtmon logging needs to be enabled explicitly and will collect network state information directly from the kernel on an ongoing basis. Enabling this enables a systemd service which is running while rtmon is enabled. This can be used to prove that service issues started after an external network failure. [1]
-
If d-state is enabled, during each interval run, the number of processes in D state are detected and if this number is greater than or equal to a user-defined value (number of processes in D state), echo t > /proc/sysrq-trigger is executed to get a task trace of all processes. This is a one-time action, meaning, once task trace is triggered, it won't be triggered again until user enables again explictly.
sudo /usr/local/sbin/gdg --interval 30 --logdays 7 --startsudo /usr/local/sbin/gdg --stopcd /var/log/gdg-dataTo see the current status of gdg including start/stop status, version, interval, data location, and current size of collected data, run
/usr/local/sbin/gdg --statuse.g.
~~~~~~~~~~~~~~~
gdg status
~~~~~~~~~~~~~~~
VERSION: gdg-0.9.1
STATUS: started
RTMON: started
INTERVAL: 15s
LOG DAYS TO KEEP: 14d
DATA LOCATION: /var/log/gdg-data/
CONFIG LOCATION: /etc/gdg.cfg
CURRENT DATA SIZE: 318MB
~~~~~~~~~~~~~~~
DSTATE: stopped
NUMPROCS: 0
If you want to change the interval (-t) or logdays (-l) after installing additional supported utilities, run
sudo /usr/local/sbin/gdg --reload --interval 60 --logdays 14sudo /usr/local/sbin/gdg --rtmonsudo /usr/local/sbin/gdg --dst <NUMPROCS>/usr/local/sbin/gdg --help- You'll need a go compiler installed
Clone it
git clone https://github.com/rfparedes/gdg.gitBuild it
cd gdg
go build -o gdgMove it
mv gdg /usr/local/sbin
sudo chmod +x /usr/local/sbin/gdgStart it
sudo /usr/local/sbin/gdg --startgdg has been validated on:
- SLE-12 (SLES or SLES-SAP 12 all SPs)
- SLE-15 (SLES or SLES-SAP 15 all SPs)
- openSUSE Leap 12/15
- Debian 9
- Debian 10
- RHEL7
- RHEL8
- Ubuntu 18.04
- Ubuntu 20.04
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the GPL-3.0 License. See LICENSE for more information.