ShadowBlip/PowerStation

Enabling/Disabling Cores/Threads should use the best cores in the CPU

stevenj opened this issue · 0 comments

To maximize performance, Enabling/Disabling cores should try and prioritize the maximum performing cores/threads.

CPU's like Ryzens have metrics to rate the core.
eg /sys/devices/system/cpu/cpu${N}/acpi_cppc/highest_perf will tell you the relative best performance or each Core.

Also, it should take into account that threads in a cpu, in general, reduce single core perf but give you better aggregate perf (~+22%). Taken to its logical conclusion this could mean each core is only getting 61% of its maximum performance assuming work is evenly distributed across the threads.

So, In an 8 core / 16 thread CPU, the seemingly logically best algorithms would be something like:

If threading is disabled:
Disable the lowest performing thread on each core.
If CPUs < Max CPUs then
Disable the lowest performing cores first.

If threading is enabled:
If there are less threads enabled than cores do the same logic as when threading disabled.
If there are more threads enabled than cores, then
enable all the fastest threads in each core first.
for each remaining thread, enable the thread from the lowest performing core up.

Given the first step when threading is enabled, is the same when threading is disabled, disabling threading becomes a redundant option. If enabled threads < max cores, threading would effectively be disabled anyway.

The reason for enabling threads from low perf cores up is to improve their ability to handle aggregate workloads and preserve single core perf on the best enabled threads. Also, looking at core perf metrics from a couple of ryzens i have, shows that high perf threads usually cluster in the same core, so enabling those two threads would actually harm single threaded perf. The highest rated cores should be the last to be paired with a thread to maintain highest possible single core perf.

If the cores/threads had a metric for most energy efficient, and energy efficiency was the goal vs max performance, a similar algorithm could be performed using the energy efficiency rating vs the performance rating.

This algorithm would also be helping the scheduler if the kernel scheduler is preferred core aware (such as many current patched kernels and 6.9 will be for ryzens).

The rating would need to be CPU specific, a general fallback would use lscpu MAXMHZ, but thats not accurate for ryzen perf ratings, but it might be sufficient for Intel chips (I don't have any to test), so one would prioritize P cores over E cores, etc.
In the case where there is no CPU specific logic, and lscpu reports equal MAXMHZ, a fallback would rate one core in a multithreaded CPU higher than its neighbor, so that threading is always a last resort.