/nv-pwr-ctrl

Simple utility to cap Nvidia GPU power limits on Linux based on fan speed and/or GPU temperature

Primary LanguageC++GNU General Public License v3.0GPL-3.0

nv-pwr-ctrl

Simple utility to cap Nvidia GPU power limits on Linux based on max fan speed and/or max GPU temperature

Table of Contents

Purpose

The main reason behind this small program can be traced to the simple fact I have very loud fans of my 2080 Ti RTX and during summer time those spin too quickly due to high temperatures.
I could have used other utilities to control the fan speed itself, but controlling the fan speed directly is bad; if the fan is spinning it means it needs to dissipate heath, hence another way to relieve pressure from the fans spinning is to actually produce less heath, i.e. consume less power.

Not sure if there was already such simple utility, I've decided to roll my own, to experiment with the NVML libraries.

How to run

Usage: ./nv-pwr-ctrl [options]
Executes nv-pwr-ctrl 0.1.0

Controls the power limit of a given Nvidia GPU based on max fan speed

-f, --max-fan f     Specifies the target max fan speed, default is 80%
-t, --max-temp t    Specifies the target max gpu temperature, default is 80C
    --gpu-id i      Specifies a specific gpu id to control, default is 0
    --do-not-limit  Don't limit power - useful to print stats for testing
    --fan-ctrl f    Set the fan control algorithm to 'f'. Valid values are currently:
                    'simple'   - Reactive based on current fan speed
                    'wavg'     - Weights averages and smooths transitions
                    'gpu_temp' - Reactive based on GPU temperature alone
                    Default is 'gpu_temp'
-w, --max-mwatt     Specifies a maximum power limit (in mW) without dynamically adjust it
    --report-max    On exit prints how many seconds the fan speed has been
                    above max speed
-m, --min-limit     Sets minimum percentage limit as a low threshold of how much the
                    power can be decreased (i.e. 90 would imply power to never go
                    lower than 90% of current max power limit - default 0)
-l, --log-csv       Prints CSV log-like information to std out
    --verbose       Prints additional log every iteration (4 times a second)
-c, --current       Prints current power, limit and GPU temperature on std::err
    --help          Prints this help and exit

Run with root/admin privileges to be able to change the power limits

One can simply run the utility with sudo ./nv-pwr-ctrl and then push Ctrl+C to quit.

Sample Charts

These chart have been produced in multiple ~5 minutes sessions of Monster Hunter: World. The game was playable all the time, at 3440x1440 with all graphical options/details set to max (apart AA) and G-Sync on.
I could not notice I was playing with a variable cap on Power Limits.

simple fan control option: MH:W Chart simple

wavg fan control option: MH:W Chart wavg

gpu_temp fan control option: MH:W Chart gpu temp

Reference when no power limit is set: MH:W Chart no limit

sudo requirements

Unfortunately this utility needs sudo access because it invokes a function (nvmlDeviceSetPowerManagementLimit) which requires such elevated privileges.

How to build

Simply run make clean && make release and you're done.

Dependencies

This executable is dependent on NVML (i.e. libnvidia-ml.so), but it tries to load it dynamically at run time, which means that no Nvidia dependencies are required to build this. It does require the proprietary drivers correctly installed.

Known Issues

List of known issues:

  • Sometimes NVML API may fail (i.e. Exception: nvml::nvmlDeviceSetPowerManagementLimit(dev, tgt_gpu_pwr_limit) failed, error: 2), thus leaving the Power Limits to potentially low settings (if running with low fan speed or GPU temperature).
    In such cases, simply restart the application as sudo again and stop it, it should fix it. Worst case scenario, a restart of the machine will do.

F.A.Q

  1. Can I run this on AMD or Intel GPUs?
    No... this is for Nvidia only.
  2. Can you support open-source Nvidia drivers?
    No, this is using NVML proprietary libary.
  3. The lower the fan speen (i.e. -f), the lower the FPS... is this expected?
    Yes, because in order to keep the fan spinning at just x%, then the power will be limited. Having less power means keeping the GPU cooler but also less capable of coping with the demands of the CPU and the game.
  4. Can you add feature X please?
    Open a ticket and let's discuss...
  5. Why didn't you use application X to achieve something similar?
    I didn't want to have to install/compile other packages and didn't need fancy UI. I needed a simple app I can spin with a script and stop with Ctrl+C. That's it.
  6. Does this work only with a specific graphics API?
    No, it is in fact agnostic of any API and would also be able to power limit even compute-only workloads on the GPU.
  7. Which fan control algorithm would you reccomend?
    As of now I would reccomend to set the fan control based on GPU temperature alone, gpu_temp.

Task list

  • ???
  • Added minimum power limit barrier to avoid constrainig the GPU too much
  • Report current power, limit and GPU temperature on std::err
  • Drive the power limit based on GPU temperature
  • Rename verbose option to log
  • std::cout writes CSV useful for graphs
  • Basic functionality