travisdowns/uarch-bench

clock detectd is not right for hybrid CPU.

Opened this issue · 6 comments

OS: ubuntu 22.04.1 kernel 5.19.1
CPU: Intel 12900KF.

situation 1
bios: disable all e-core, CPU freq= 4GHz, grub kernel command line add idle=poll(disable C-state):
cpupower show all cores(pcore) run at 4GHz.
then:
export UARCH_BENCH_CLOCK_MHZ=4000
sudo ./uarch-bench.sh

Using timer: clock
Welcome to uarch-bench (e4f54d5)
Supported CPU features: SSE3 PCLMULQDQ VMX EST TM2 SSSE3 FMA CX16 SSE4_1 SSE4_2 MOVBE POPCNT AES AVX RDRND TSC_ADJ BMI1 AVX2 BMI2 ERMS RDSEED ADX CLFLUSHOPT CLWB INTEL_PT SHA
Pinned to CPU 0
Source pages allocated with transparent hugepages: 100.0
Median CPU speed: 3.201 GHz
Running benchmarks groups using timer clock

Median CPU speed: 3.201 GHz
UARCH_BENCH_CLOCK_MHZ does not work?

situation 2
bios: enable e-core , CPU freq= default, grub kernel command line add idle=poll(disable C-state):
cpupower show p-cores run at 5.2GHz and e-cores run at 3.7GHz

UARCH_BENCH_CLOCK_MHZ not set.

sudo ./uarch-bench.sh

Succesfully disabled turbo boost using intel_pstate/no_turbo
Using timer: clock
Welcome to uarch-bench (e4f54d5)
Supported CPU features: SSE3 PCLMULQDQ VMX EST TM2 SSSE3 FMA CX16 SSE4_1 SSE4_2 MOVBE POPCNT AES AVX RDRND TSC_ADJ BMI1 AVX2 BMI2 ERMS RDSEED ADX CLFLUSHOPT CLWB INTEL_PT SHA
Pinned to CPU 0
Source pages allocated with transparent hugepages: 100.0
Median CPU speed: 3.101 GHz
Running benchmarks groups using timer clock

Median CPU speed: 3.101 GHz

The problem here is the Median CPU speed was detected as 3.101 GHz, that is not right for p-cores and e-cores.

Hi Edison,

Thanks for this report.

export UARCH_BENCH_CLOCK_MHZ=4000
sudo ./uarch-bench.sh

Median CPU speed: 3.201 GHz
UARCH_BENCH_CLOCK_MHZ does not work?

This is because sudo, by default, does not pass through environment variable from the calling context. So you export that variable but it won't be seen by the process(es) running inside the sudo call.

You could do it like this:

 sudo UARCH_BENCH_CLOCK_MHZ=4000 ./uarch-bench.sh

This explicitly sets the variable for the sudo'd process. Or you can use -E to pass through all vars from the parent process. Finally you could use sudo --preserve-env=UARCH_BENCH_CLOCK_MHZ ./uarch-bench.sh to pass through only the specific variable you care about.

Median CPU speed: 3.201 GHz

@edisonchan that is because ./uarch-bench.sh disables turbo frequencies by default for the duration of the run and the 12900KF has a non-turbo frequency of 3.2 GHz. This gives a much more stable measurement in "cycles" which is usually what I'm interested in, since you don't see many frequency transitions as you would when running at max turbo speed.

However, you can run the test with turbo enabled if you'd like: just run the binary ./uarch-bench directly, rather than the ./uarch-bench.sh wrapper script. This wrapper just calls into the binary after disabling turbo and setting performance governor, but you can do this latter step by hand.

You can confirm this by doing a ./uarch-bench.sh run and then checking cpupower or other frequency reporting tool while the test is running. It should show 3.2 GHz.

@travisdowns thanks for the replies.
I choice set the UARCH_BENCH_CLOCK_MHZ because I can get very stable clock for Intel(add idle=poll in kernel cmdline and set 4 GHzin bios) and AMD(disable turbo clock in linux and set 4 GHz in bios).

and I have another question related. According Intel, when idle=poll, that is mean the CPU keep run NOPs, will that cause the test results not right?

I choice set the UARCH_BENCH_CLOCK_MHZ because I can get very stable clock for Intel(add idle=poll in kernel cmdline and set 4 GHzin bios) and AMD(disable turbo clock in linux and set 4 GHz in bios).

Makes sense. You can run it like ./uarch-bench then to avoid the turbo setting. I should add a mode to the wrapper to allow keeping turbo mode on.

and I have another question related. According Intel, when idle=poll, that is mean the CPU keep run NOPs, will that cause the test results not right?

It would not directly affect "steady state" test results. It just means the CPU will be "hot polling" while it has nothing to do, e.g., before and after the test. The effects are generally: (a) the CPU always runs at a high frequency, rather than ramping down during idle and then back up under load and (b) more heat generated and the CPU runs hotter for the same reason.

Effect (a) can mean that results are more stable since you don't have the frequency ramp-up period, but on the other hand uarch-bench already has lots of warmup in there to try to ensure the CPU is already running at its target frequency before the measurements start.

Effect (b) can have the opposite effect: poll might produce slower and less stable resluts if the CPU throttles from heat because it starts the test already at a high temperature (e.g., the heat sink is already saturated) versus the case where the CPU starts cool and take take advantage of the thermal mass of the cooling system to run at higher than steady-state frequency for a while which may be long enough to complete the test.