A small script for writing out gpu usage data using nvitop
This requires nvitop, which itself requires
- Python 3.7+
- NVIDIA Management Library (NVML)
- nvidia-ml-py
- psutil
- cachetools
- termcolor
- curses (with
libncursesw
)
git clone https://github.com/flaviusb/nvitop-thing.git
cd nvitop-thing
python -m venv foo # or you can use: virtualenv foo
. foo/bin/activate
pip3 install --upgrade nvitop
Then, later, to run you can run:
cd nvitop-thing
. foo/bin/activate
python3 logging-script-thing.py
The logging script can take a location to output the logfile to. For example:
cd nvitop-thing
. foo/bin/activate
python3 logging-script-thing.py /var/log/gpuusage/gpu-usage.log
The script outputs three sets of csv data interleaved into log files. The log files are rotated weekly. The default base filename is usage.csv
.
The three formats are
"PhysicalGPU",unix time,device.physical_index,device.memory_used,device.memory_free,device.memory_total,\"device.performance_state\",device.power_usage,device.gpu_utilization
and
"Process",unix time,process username,process name,process elapsed_time_in_seconds,process gpu_sm_utilization,process gpu_memory_utilization, process commandline
and
"User",unix time,process username,total gpu_memory_utilization,total gpu_sm_utilization
And so you can filter based on whether the first cell is PhysicalGPU
or Process
or User
.
The kicker maker makes a script that you can run from a crontab, which will restart the logger if it has died.
The kicker maker can be run like so:
./mk-kicker.sh --base /program/location/nvitop-thing/ --venv nameOfVirtualenv --log /var/log/gpuusage/gpu-usage.log --python 3.9
which will try to run the logger from within /program/location/nvitop-thing/
, using the venv inside that folder called nameOfVirtualenv
, logging to /var/log/gpuusage/gpu-usage.log
, and using pyton 3.9 and pip 3.9.
If your log location is somewhere only root can write to, then you will need to run the kicker script from the superuser's crontab.
The kicker is intended to be run a few times every hour from cron or equivalent; how often is up to you though. It causes minimal load when it runs, testing whether the logger is already running and exiting quickly if it is.