Parse cpu a76 core as a55 result in serious performance problem
xiaotongnii opened this issue · 5 comments
Platform:android_arm64-android13
SOC: cortex A 4a55 and 4a76
Cpuinfo:core_siblings_list:0-7
HW:design 4a55 and 4a76 as a cluster on SOC and configure dtsi cpu 8core as a cluster.
Problem:Use benchmark tools test CPU performance,find cpuinfo parser cpu a76 core as a55 result in serious performance problem on Arm CPU
The cpuinfo call the main function as the following:
https://github.com/pytorch/cpuinfo/blob/76d5e8f5b563daa65340a60fce0e9aec73a936df/src/arm/linux/init.c#L34C37-L34C37
Use processors[0].package_leader_id update processors[sibling].package_leader_id(siblings value form 0 to 7).
Result in processors[i].package_leader_id value is 0.(i form 0 to 7)
//init.c:34
static bool cluster_siblings_parser(
uint32_t processor, uint32_t siblings_start, uint32_t siblings_end,
struct cpuinfo_arm_linux_processor* processors)
{
for(int i = 0; i < 8; i++)
{cpuinfo_log_debug("init.c.39.processors[i].package_leader_id:%d, ",processors[i].package_leader_id);}
processors[processor].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
uint32_t package_leader_id = processors[processor].package_leader_id;
cpuinfo_log_debug("File: %s, Function: %s, Line: %d, processor:%d, package_leader_id:%d,processors[processor].package_leader_id:%d,siblings_start:%d, siblings_end:%d\n", __FILE__, __func__, __LINE__,processor, package_leader_id,processors[processor].package_leader_id,siblings_start,siblings_end);
for (uint32_t sibling = siblings_start; sibling < siblings_end; sibling++) {
if (!bitmask_all(processors[sibling].flags, CPUINFO_LINUX_FLAG_VALID)) {
cpuinfo_log_info("invalid processor %"PRIu32" reported as a sibling for processor %"PRIu32,
sibling, processor);
continue;
}
const uint32_t sibling_package_leader_id = processors[sibling].package_leader_id;
if (sibling_package_leader_id < package_leader_id) {
package_leader_id = sibling_package_leader_id;
}
// use processors[0].package_leader_id update processors[sibling].package_leader_id.
// processors[sibling].package_leader_id value is 0.
processors[sibling].package_leader_id = package_leader_id;
processors[sibling].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
cpuinfo_log_debug("File: %s, Function: %s, Line: %d, sibling:%d, sibling_package_leader_id:%d, package_leader_id:%d\n", __FILE__, __func__, __LINE__,sibling,sibling_package_leader_id,package_leader_id);
}
processors[processor].package_leader_id = package_leader_id;
return true;
}
Due to processors[processor].package_leader_id (processor from 0 to 7) is 0, cluster_leader value is 0 all the time.
Line 368 in 76d5e8f
/* Initialize core vendor, uarch, MIDR, and frequency for every logical processor */
for (uint32_t i = 0; i < arm_linux_processors_count; i++) {
if (bitmask_all(arm_linux_processors[i].flags, CPUINFO_LINUX_FLAG_VALID)) {
const uint32_t cluster_leader = arm_linux_processors[i].package_leader_id;
if (cluster_leader == i) {
/* Cluster leader: decode core vendor and uarch */
cpuinfo_arm_decode_vendor_uarch(
arm_linux_processors[cluster_leader].midr,
#if CPUINFO_ARCH_ARM
!!(arm_linux_processors[cluster_leader].features & CPUINFO_ARM_LINUX_FEATURE_VFPV4),
#endif
&arm_linux_processors[cluster_leader].vendor,
&arm_linux_processors[cluster_leader].uarch);
} else {
/* Cluster non-leader: copy vendor, uarch, MIDR, and frequency from cluster leader */
arm_linux_processors[i].flags |= arm_linux_processors[cluster_leader].flags &
(CPUINFO_ARM_LINUX_VALID_MIDR | CPUINFO_LINUX_FLAG_MAX_FREQUENCY);
arm_linux_processors[i].midr = arm_linux_processors[cluster_leader].midr;
arm_linux_processors[i].vendor = arm_linux_processors[cluster_leader].vendor;
arm_linux_processors[i].uarch = arm_linux_processors[cluster_leader].uarch;
arm_linux_processors[i].max_frequency = arm_linux_processors[cluster_leader].max_frequency;
}
}
}
Line 383 in 76d5e8f
arm_linux_processors[i].uarch = arm_linux_processors[cluster_leader].uarch;
Use arm_linux_processors[0].uarch update arm_linux_processors[i].uarch(i from o0 to 7),Reulst in
arm_linux_processors[i].uarch is the same as update arm_linux_processors[0].uarch which is a55(0x00300355 in the cpuinfo.h)
Line 409 in 76d5e8f
Please give me some advices.
BR.
Xiaotong
Nobody reply?OK
Now,Cpuinfo should use logical cpu cluster device node to parse cpu cluster rather than use physical cpu cluster!!!
We nedd to add a new cpuinfo device node to parse logical cpu cluster,Such as package_leader_id and package_leader_id_list or logical_core_siblings_list 。
Line 189 in 76d5e8f
We can not use a physical cpu cluster device node to parse directly ,because cpu arch is changing all the time.
BR.
Xiaotong
I think some of that is being added as part of RISC-V bringup, but I wonder if it could be landed separately...
I think some of that is being added as part of RISC-V bringup, but I wonder if it could be landed separately...
Hi,malfet, what is means about " I wonder if it could be landed separately".
For Arm CPUs,I am willing to slove the issue.And RISV-V CPUS also has a similar programe.
How to make the issue become a PR?
I suspect we're referring to PR #190 here, where we added the concept of core_leader_id
and cluster_leader_id
to the RISC-V structure definition.
In the RISC-V implementation, we tie the processor's uarch to the core_leader. This seems to be essentially the problem you're flagging on ARM - logical processors on the same physical core are guaranteed to have one uarch, but multiple physical cores tied to the same cluster don't necessarily have to be the same uarch.
I think what malfet@ meant by 'it could be landed separately' is whether #190 might somehow block you, because 2 weeks ago at the comment, it was not yet landed. However, these paths are completely independent, so you could put up a fix for this in ARM without worrying about RISC-V.
In regards to how to make this issue a PR, just create a PR separately and follow this guide to link your PR to this issue.
Application:
uint32_t uarch_index = cpuinfo_get_current_uarch_index();-> const struct cpuinfo_uarch_info* cpuinfo_uarch_info = cpuinfo_get_uarch(uarch_index); (cpuinfo_uarchs,init->fill uarchs in it)-> cpuinfo_uarch_info->uarch ;(Type of CPU microarchitecture )
Cpuinfo:
sibling_package_leader_id ->package_leader_id -> processors[sibling].package_leader_id
cpuinfo_arm_linux_init->cpuinfo_linux_detect_core_siblings->cluster_siblings_parser
static bool cluster_siblings_parser(
uint32_t processor, uint32_t siblings_start, uint32_t siblings_end,
struct cpuinfo_arm_linux_processor* processors)
{
processors[processor].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
uint32_t package_leader_id = processors[processor].package_leader_id;
for (uint32_t sibling = siblings_start; sibling < siblings_end; sibling++) {
if (!bitmask_all(processors[sibling].flags, CPUINFO_LINUX_FLAG_VALID)) {
continue;
}
const uint32_t sibling_package_leader_id = processors[sibling].package_leader_id;
if (sibling_package_leader_id < package_leader_id) {
// package_leader_id = sibling_package_leader_id;
}
// processors[sibling].package_leader_id = package_leader_id;
processors[sibling].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
}
processors[processor].package_leader_id = package_leader_id;
return true;
}