Simple utility to cap Nvidia GPU power limits on Linux based on max fan speed and/or max GPU temperature
The main reason behind this small program can be traced to the simple fact I have very loud fans of my 2080 Ti RTX and during summer time those spin too quickly due to high temperatures.
I could have used other utilities to control the fan speed itself, but controlling the fan speed directly is bad; if the fan is spinning it means it needs to dissipate heath, hence another way to relieve pressure from the fans spinning is to actually produce less heath, i.e. consume less power.
Not sure if there was already such simple utility, I've decided to roll my own, to experiment with the NVML libraries.
Usage: ./nv-pwr-ctrl [options]
Executes nv-pwr-ctrl 0.1.0
Controls the power limit of a given Nvidia GPU based on max fan speed
-f, --max-fan f Specifies the target max fan speed, default is 80%
-t, --max-temp t Specifies the target max gpu temperature, default is 80C
--gpu-id i Specifies a specific gpu id to control, default is 0
--do-not-limit Don't limit power - useful to print stats for testing
--fan-ctrl f Set the fan control algorithm to 'f'. Valid values are currently:
'simple' - Reactive based on current fan speed
'wavg' - Weights averages and smooths transitions
'gpu_temp' - Reactive based on GPU temperature alone
Default is 'gpu_temp'
-w, --max-mwatt Specifies a maximum power limit (in mW) without dynamically adjust it
--report-max On exit prints how many seconds the fan speed has been
above max speed
-m, --min-limit Sets minimum percentage limit as a low threshold of how much the
power can be decreased (i.e. 90 would imply power to never go
lower than 90% of current max power limit - default 0)
-l, --log-csv Prints CSV log-like information to std out
--verbose Prints additional log every iteration (4 times a second)
-c, --current Prints current power, limit and GPU temperature on std::err
--help Prints this help and exit
Run with root/admin privileges to be able to change the power limits
One can simply run the utility with sudo ./nv-pwr-ctrl
and then push Ctrl+C
to quit.
These chart have been produced in multiple ~5 minutes sessions of Monster Hunter: World. The game was playable all the time, at 3440x1440 with all graphical options/details set to max (apart AA) and G-Sync on.
I could not notice I was playing with a variable cap on Power Limits.
Reference when no power limit is set:
Unfortunately this utility needs sudo
access because it invokes a function (nvmlDeviceSetPowerManagementLimit) which requires such elevated privileges.
Simply run make clean && make release
and you're done.
This executable is dependent on NVML (i.e. libnvidia-ml.so), but it tries to load it dynamically at run time, which means that no Nvidia dependencies are required to build this. It does require the proprietary drivers correctly installed.
List of known issues:
- Sometimes NVML API may fail (i.e.
Exception: nvml::nvmlDeviceSetPowerManagementLimit(dev, tgt_gpu_pwr_limit) failed, error: 2
), thus leaving the Power Limits to potentially low settings (if running with low fan speed or GPU temperature).
In such cases, simply restart the application assudo
again and stop it, it should fix it. Worst case scenario, a restart of the machine will do.
- Can I run this on AMD or Intel GPUs?
No... this is for Nvidia only. - Can you support open-source Nvidia drivers?
No, this is using NVML proprietary libary. - The lower the fan speen (i.e. -f), the lower the FPS... is this expected?
Yes, because in order to keep the fan spinning at just x%, then the power will be limited. Having less power means keeping the GPU cooler but also less capable of coping with the demands of the CPU and the game. - Can you add feature X please?
Open a ticket and let's discuss... - Why didn't you use application X to achieve something similar?
I didn't want to have to install/compile other packages and didn't need fancy UI. I needed a simple app I can spin with a script and stop withCtrl+C
. That's it. - Does this work only with a specific graphics API?
No, it is in fact agnostic of any API and would also be able to power limit even compute-only workloads on the GPU. - Which fan control algorithm would you reccomend?
As of now I would reccomend to set the fan control based on GPU temperature alone,gpu_temp
.
- ???
- Added minimum power limit barrier to avoid constrainig the GPU too much
- Report current power, limit and GPU temperature on std::err
- Drive the power limit based on GPU temperature
- Rename verbose option to log
-
std::cout
writes CSV useful for graphs - Basic functionality