kevinlekiller/amdctl

How can I force my AMD Ryzen 5 3500U to stay cool?

Opened this issue · 21 comments

I'm very ignorant about CPUs and their voltage, wattage, multipliers, divisors etc.
I read the doc, #10, amdctl -x, the ArchWiki, something on Wikipedia and other inline information, but I'm still very unsure, especially since wrong values might fry my chip.

At times I wish to keep my fan off, thus force my laptop cool. That's how I ran into amdctl.

AMD Ryzen 5 3500U

I have an AMD Ryzen 5 3500U, family 17h:

Voltage ID encodings: SVI (serial)
Detected CPU model 18h, from family 17h with 8 CPU cores.

Northbridge:
No P-States on AMD17H Northbridge.

My cores look like this:

Core 0 | P-State Limits (non-turbo): Highest: 1 ; Lowest 3 | Current P-State: 3
 Pstate Status CpuFid CpuDid CpuVid CpuMult CpuFreq CpuVolt IddVal IddDiv CpuCurr CpuPower
      0      1    105     10     53  21.00x 2000MHz  1219mV     21     10  31.00A   37.78W
      1      1    102     12     96  17.00x 1600MHz   950mV     17     10  27.00A   25.65W
      2      1     98     14    102  14.00x 1400MHz   912mV     14     10  24.00A   21.90W
      3      0      0      0     88   -nanx    0MHz  1000mV      0     10  10.00A   10.00W

Disabling P-States doesn't work

I tried to disable P-States 0 and 1:

$ sudo amdctl -p0 -a0
$ sudo amdctl -p1 -a0

After this, amdctl shows 0 as their Status for every core.
Nevertheless, my cores are still able to enter those P-States when they're under load (amdctl will still show "Current P-State: 1") and the cpu MHz displayed in /proc/cpuinfo can still close 3700.

Undervolting and some questions

Should I increase CpuVid in order to undervolt my P-States?
Which P-States? And any hint on what kind of values I should use?

My lowest P-State is 3, but that one shows -nan CPU multiplier (and Status 0, 0 as all ids etc). Does it simply mean that I cannot tweak the lowest P-State, or is something broken?

I'm also confused about a few things related to frequency...
According to the specs, my CPU's base frequency is 2.1GHz and the Max Boost Clock is up to 3.7GHz. Yet /proc/cpuinfo reports a cpu MHz close to 1200 when the load is minimal. Shouldn't the minimum be 2100?
The P-States displayed by amdctl show a clock speed of 2GHz, 1.6GHz and 1.4GHz. What does it mean? Is that the minimum core frequency while the core is in the given P-State, or what else?

When and how should one tweak the other settings (e.g. CPU voltage id and divisor id)? Is it better to never touch those?

I apologize in advance for the many questions!
If anybody has any links that explains how all of this works, I'm looking forward to giving them a read.

Thanks for this useful piece of open source software!

From what I can understand, AMD has not provided any information to open source developers on how to change the boost states on zen (all I could see is the non boosted states: page 129 and up https://developer.amd.com/wp-content/resources/56255_3_03.PDF), someone (maybe already did I'm not sure) has to reverse engineer how AMD Ryzen Master is doing it on Windows or how it's done at the UEFI level.

When I added zen support to the utlity, I didn't test it (didn't have a zen based CPU), just did it based off their manual. I think the problem is, the boost algorithm used by zen makes the p-state system described in their manual useless, it probably overrides those and has its own states, hence why you when you disable P-states nothing changes, which means for zen users this utlity might only be useful for people disabling AMD's boost algorithm.

Searching for "boost" in https://developer.amd.com/wp-content/resources/56255_3_03.PDF

On page 124 "CpbDis: core performance boost disable. Read-write. Reset: 0. 0=CPB is requested to be enabled. 1=CPB is disabled. Specifies whether core performance boost is requested to be enabled or disabled. If core performance boost is disabled while a core is in a boosted P-state, the core automatically transitions to the highest performancenon-boosted P-state. "

So by setting "CpbDis" to 1 should make the CPU use non boosted P-states?

Thank you for your response (and for this tool).

It'll take me quite some time to understand the content of that PDF.
In the meanwhile I'm really happy to help testing amdctl on my Ryzen (as long as it's safe enough - I wouldn't want risking to permanently brick my new laptop).

The description of "CpbDis" makes me reach your same conclusion.
How can I try setting it to 1? amdctl doesn't seem to support that field. I can try to read and write some data directly into /dev/cpu/*/msr, but couldn't figure out what's the MSR number for "CpbDis".

OK, I learned something on how MSR works.

"CpbDis" is the 25th bit in the register number 0xC0010015, and it's currently unset:

$ sudo rdmsr 0xC0010015
9000011
$ sudo rdmsr 0xC0010015 --bitfield 25:25
0

In order to set it, I should set 0xC0010015 to:

0x9000011 | (1 << 25) = 0xb000011

If everything so far is correct, then in order to set "CpbDis" to 1 I should run:

$ sudo wrmsr 0xC0010015 0xb000011

And I should do the same thing for all the other processor cores.

Is this correct?
Sorry that I'm not more independent. The fear of setting the wrong bit somewhere scares me a lot.

I finally ran that command after finding confirmation in various pieces (like this one: https://github.com/mpollice/AmdMsrTweaker/blob/master/Info.cpp#L98).

I can confirm that my CPU is not entering P-State 0 and is not going above 2000 MHz anymore.

Great! Maybe this can be added to added to amdct, just haven't had time to do anything with it lately.

I don't know whether it's relevant here, but a couple of days ago I realized that I can enable/disable boost using /sys/devices/system/cpu/cpufreq/boost.

If I disable boost using wrmsr, I find it back on upon waking up from suspension. I'm assuming it's whatever manages /sys/devices/system/cpu/cpufreq/boost that turns it on.

If this tool is meant to run on systems with a recent Linux kernel, then it's probably unnecessary to add this feature. It would however be useful for systems that don't support it already, like older kernels or non-Linux ones.

@Ryzener Did you have any success with undervolting? I own a Thinkpad E495 with the same CPU and I would like to have the machine completely silent, as you, and not turning on the fan from time to time. Also my temp is always around 50°C, browsing the web. Would be nice to find a solution. I used https://github.com/amanusk/s-tui to monitor the cpu. Good news: Even executing stress tests with this tool, the CPU does not go beyound ~75-80°C without throttling, but the fan gets noisy.

I tried undervolting with no success. I'm not 100% sure of what I was doing or how to test it though, so I might have done something wrong.

I'm currently disabling boost states and limiting the max frequency using cpupower. The fan still starts spinning when the system is under stress, but at least that no longer happens with just a few browser windows idling.

If you find a better solution and manage to undervolt your CPU, please let me know! :)

@Ryzener you are genius. I wanted thermal headroom, echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost is how you get that. Thanks a lot!

FWIW, I think undervolting does work. I got an extra 10% frequency when setting P0 to 756mV instead of 1200ish - the limit was thermal.

EDIT: FYI:

Voltage ID encodings: SVI (serial)
Detected CPU model 60h, from family 17h with 16 CPU cores.

Core 0 | P-State Limits (non-turbo): Highest: 1 ; Lowest 3 | Current P-State: 2
 Pstate Status CpuFid CpuDid CpuVid CpuMult CpuFreq CpuVolt IddVal IddDiv CpuCurr CpuPower
      0      1    104      8    127  26.00x 2600MHz   756mV     29     10  39.00A   29.49W
      1      1    102     12    127  17.00x 1600MHz   756mV     17     10  27.00A   20.42W
      2      1     98     14    127  14.00x 1400MHz   756mV     14     10  24.00A   18.15W

... (etc)

A short reason is for this behavior is work as expected. Starting with the 1st gen Ryzen this processor family is able to scale the frequency much more aggressive then ever before. This is the XFR mode is responsible for clock the chip as high as possible, while the temperature and power consumption is below the threshold.
So if you make the processor cooler, it will be able to reach higher clocks.
https://www.gamingpcbuilder.com/ryzen-5-3600-wraith-stealth-vs-deepcool-tower-cooler/

FWIW, I think undervolting does work. I got an extra 10% frequency when setting P0 to 756mV instead of 1200ish - the limit was thermal.

Pstate Status CpuFid CpuDid CpuVid CpuMult CpuFreq CpuVolt IddVal IddDiv CpuCurr CpuPower
0 1 104 8 127 26.00x 2600MHz 756mV 29 10 39.00A 29.49W
1 1 102 12 127 17.00x 1600MHz 756mV 17 10 27.00A 20.42W

Cheaterman, that's exactly what I want.
Do you remember the exact series of commands that enabled you to accomplish this?
My CPU is idling at 50 degrees and the fan is starting to show premature signs of physical wear already (noise, vibration, etc)
I want to purposely limit my CPU as much possible.

Also, do you know anything about the kernel parameter amdgpu.dpm=1 ?
Does it work? Does it make a difference?
Should I enable or disable it? Thoughts?

These are my readings for sensors.
Notice how SoC (is that Socket?) voltage stay locked at 1.09V regardless of load or idle, and how P_SoC stays around 20 Watts even on idle, but on Windows SoC voltage was able to go as low as 0.696V and P_SoC as low as 2 Watts (temperatures 38 C instead of 52 C)

$ sensors      at     mx-linux (debian) kernel 5.10.0-9
BAT0-acpi-0
Adapter: ACPI interface
in0:           8.14 V  

amdgpu-pci-0200
Adapter: PCI adapter
vddgfx:           N/A  
vddnb:            N/A  
edge:         +46.0°C  

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:   732.00 mV 
SVI2_SoC:      1.09 V  
Tdie:         +46.2°C  (high = +95.0°C)
Tctl:         +46.2°C  
SVI2_P_Core:   6.85 W  
SVI2_P_SoC:   22.50 W  
SVI2_C_Core:  11.43 A  
SVI2_C_SoC:   20.56 A

I also find a bit strange when I boot up ISO's of older distros with older kernel version 5.8 (october 2020) then my SoC voltage is able to fluctuate again, it can go down to 0.800V on idle, but as soon as I boot distros with this newer kernel (5.10 or higher) then I notice my SoC voltage is locked at 1.09V again (waste of power on idle) Is that a regression in the kernel? Or is it that older distros were reporting older values? (unlikely, since older distros with kernel 5.8 also reported idle temps of 40 C, more similar to what I get on Windows, instead of 50 C)

I use these settings on my 2500U, works around 38-42 oC
Pstate Status CpuFid CpuDid CpuVid CpuMult CpuFreq CpuVolt IddVal IddDiv CpuCurr CpuPower
0 1 100 10 80 20.00x 2000MHz 1050mV 20 10 30.00A 31.50W
1 1 86 12 104 14.33x 1400MHz 900mV 17 10 27.00A 24.30W
2 1 50 12 124 8.33x 800MHz 775mV 16 10 26.00A 20.15W
3 0 0 0 88 -nanx 0MHz 1000mV 0 10 10.00A 10.00W

Can you share the command line you use?
I don't wanna risk typing anything wrong with these kinds of tools...

When it comes to UNDERclocking/undervolting, there's no risk of permanent damage, is it?
I just want to run as cool as possible, I don't mind performance loss.

Just out of curiosity, I tried going from kernel 5.10 booting into an ISO with kernel 5.14, but I found that the CPU runs EVEN HOTTER there. Each version seems to bring regressions, at least for 3500U chip (zen+ Picasso generation) Using kernel boot parameter amdgpu.dpm=1 or =0 doesn't seem to make any difference, unless some other module needs to be activated along with it (again, why linux makes this non-automatic I can never understand, and why everything has to be so many layers of manual labor and cryptic information, unless someone posts the commands online you never be able to find all the features inside your own kernel)

yes, here you go

sudo amdctl -p 2 -v 124 -f 50
sudo amdctl -p 1 -v 104 -f 86
sudo amdctl -p 0 -v 80

Sorry for the unrelated hijack, it's not relevant to amdctl itself, but i just need to ask where can I report this. Basically none of the things mentioned above work for me:

  1. /sys/devices/system/cpu/cpufreq/boost and /sys/devices/system/cpu/cpu0/cpufreq/cpb are both 0 as default, yet boost is enabled and working
  2. sudo rdmsr 0xc0010015 reports either 0x1b001011 or 0x19001011, differently on each boot. They differ (you guessed it!) by the 25th bit, and switching between them doesn't change anything, boost is always enabled

I know boost is enabled because

  1. the cpu reaches those frequencies
  2. cpupower frequency-info says it is (and show correct states)
  3. in cpuid, under Advanced Power Management Features (0x80000007/edx) there is CPB: core performance boost = true

My platform is AMD A10-9600P (k15 family, excavator architecture, 60h-6Fh or 70h-7Fh), and the manual in https://www.amd.com/system/files/TechDocs/50742_15h_Models_60h-6Fh_BKDG.pdf clearly says that boost activity is decided by MRSC001_0015. I'm on a relatively fresh install of debian testing (bookworm) with kernel 5.18. If you need additional info don't hesitate to ask.

Does anyone know if this is something worth solving, and where can I report this, if need be? Thanks in advance.

A assume that would be a forced by the BIOS. It might worth a try to check it with turionpowercontrol ott, IIRC that has a P-State monitor function.

@vinibali That's probably it, its a locked HP bios. Thanks!

@vinibali That's probably it, its a locked HP bios. Thanks!

I would try to check even if the msr values can be changed or not.