pytorch/cpuinfo

Parse cpu a76 core as a55 result in serious performance problem

xiaotongnii opened this issue · 5 comments

Platform:android_arm64-android13
SOC: cortex A 4a55 and 4a76
Cpuinfo:core_siblings_list:0-7
HW:design 4a55 and 4a76 as a cluster on SOC and configure dtsi cpu 8core as a cluster.

Problem:Use benchmark tools test CPU performance,find cpuinfo parser cpu a76 core as a55 result in serious performance problem on Arm CPU

The cpuinfo call the main function as the following:
https://github.com/pytorch/cpuinfo/blob/76d5e8f5b563daa65340a60fce0e9aec73a936df/src/arm/linux/init.c#L34C37-L34C37

Use processors[0].package_leader_id update processors[sibling].package_leader_id(siblings value form 0 to 7).
Result in processors[i].package_leader_id value is 0.(i form 0 to 7)

//init.c:34
static bool cluster_siblings_parser(
	uint32_t processor, uint32_t siblings_start, uint32_t siblings_end,
	struct cpuinfo_arm_linux_processor* processors)
{
	for(int i = 0; i < 8; i++)
	   {cpuinfo_log_debug("init.c.39.processors[i].package_leader_id:%d, ",processors[i].package_leader_id);}
	processors[processor].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
	uint32_t package_leader_id = processors[processor].package_leader_id;
    cpuinfo_log_debug("File: %s, Function: %s, Line: %d, processor:%d, package_leader_id:%d,processors[processor].package_leader_id:%d,siblings_start:%d, siblings_end:%d\n", __FILE__, __func__, __LINE__,processor, package_leader_id,processors[processor].package_leader_id,siblings_start,siblings_end);

	for (uint32_t sibling = siblings_start; sibling < siblings_end; sibling++) {
		if (!bitmask_all(processors[sibling].flags, CPUINFO_LINUX_FLAG_VALID)) {
			cpuinfo_log_info("invalid processor %"PRIu32" reported as a sibling for processor %"PRIu32,
				sibling, processor);
			continue;
		}

		const uint32_t sibling_package_leader_id = processors[sibling].package_leader_id;
		if (sibling_package_leader_id < package_leader_id) {
			package_leader_id = sibling_package_leader_id;
		}
		// use processors[0].package_leader_id update processors[sibling].package_leader_id.
               //  processors[sibling].package_leader_id value is 0.
		processors[sibling].package_leader_id = package_leader_id;
		processors[sibling].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
		cpuinfo_log_debug("File: %s, Function: %s, Line: %d, sibling:%d, sibling_package_leader_id:%d, package_leader_id:%d\n", __FILE__, __func__, __LINE__,sibling,sibling_package_leader_id,package_leader_id);

	}

	processors[processor].package_leader_id = package_leader_id;

	return true;
}

Due to processors[processor].package_leader_id (processor from 0 to 7) is 0, cluster_leader value is 0 all the time.

if (cluster_leader == i) {

	/* Initialize core vendor, uarch, MIDR, and frequency for every logical processor */
	for (uint32_t i = 0; i < arm_linux_processors_count; i++) {
		if (bitmask_all(arm_linux_processors[i].flags, CPUINFO_LINUX_FLAG_VALID)) {
			const uint32_t cluster_leader = arm_linux_processors[i].package_leader_id;
			if (cluster_leader == i) {
				/* Cluster leader: decode core vendor and uarch */
				cpuinfo_arm_decode_vendor_uarch(
				arm_linux_processors[cluster_leader].midr,
#if CPUINFO_ARCH_ARM
				!!(arm_linux_processors[cluster_leader].features & CPUINFO_ARM_LINUX_FEATURE_VFPV4),
#endif
				&arm_linux_processors[cluster_leader].vendor,
				&arm_linux_processors[cluster_leader].uarch);
			} else {
				/* Cluster non-leader: copy vendor, uarch, MIDR, and frequency from cluster leader */
				arm_linux_processors[i].flags |= arm_linux_processors[cluster_leader].flags &
					(CPUINFO_ARM_LINUX_VALID_MIDR | CPUINFO_LINUX_FLAG_MAX_FREQUENCY);
				arm_linux_processors[i].midr = arm_linux_processors[cluster_leader].midr;
				arm_linux_processors[i].vendor = arm_linux_processors[cluster_leader].vendor;
				arm_linux_processors[i].uarch = arm_linux_processors[cluster_leader].uarch;
				arm_linux_processors[i].max_frequency = arm_linux_processors[cluster_leader].max_frequency;
			}
		}
	}

arm_linux_processors[i].uarch = arm_linux_processors[cluster_leader].uarch;

arm_linux_processors[i].uarch = arm_linux_processors[cluster_leader].uarch;

Use arm_linux_processors[0].uarch update arm_linux_processors[i].uarch(i from o0 to 7),Reulst in

arm_linux_processors[i].uarch is the same as update arm_linux_processors[0].uarch which is a55(0x00300355 in the cpuinfo.h)

cpuinfo_uarch_cortex_a55 = 0x00300355,

Please give me some advices.
BR.
Xiaotong

Nobody reply?OK
Now,Cpuinfo should use logical cpu cluster device node to parse cpu cluster rather than use physical cpu cluster!!!
We nedd to add a new cpuinfo device node to parse logical cpu cluster,Such as package_leader_id and package_leader_id_list or logical_core_siblings_list

uint32_t package_leader_id;

We can not use a physical cpu cluster device node to parse directly ,because cpu arch is changing all the time.
BR.
Xiaotong

malfet commented

I think some of that is being added as part of RISC-V bringup, but I wonder if it could be landed separately...

I think some of that is being added as part of RISC-V bringup, but I wonder if it could be landed separately...

Hi,malfet, what is means about " I wonder if it could be landed separately".
For Arm CPUs,I am willing to slove the issue.And RISV-V CPUS also has a similar programe.
How to make the issue become a PR?

I suspect we're referring to PR #190 here, where we added the concept of core_leader_id and cluster_leader_id to the RISC-V structure definition.

In the RISC-V implementation, we tie the processor's uarch to the core_leader. This seems to be essentially the problem you're flagging on ARM - logical processors on the same physical core are guaranteed to have one uarch, but multiple physical cores tied to the same cluster don't necessarily have to be the same uarch.

I think what malfet@ meant by 'it could be landed separately' is whether #190 might somehow block you, because 2 weeks ago at the comment, it was not yet landed. However, these paths are completely independent, so you could put up a fix for this in ARM without worrying about RISC-V.

In regards to how to make this issue a PR, just create a PR separately and follow this guide to link your PR to this issue.

Application
uint32_t uarch_index = cpuinfo_get_current_uarch_index();-> const struct cpuinfo_uarch_info* cpuinfo_uarch_info = cpuinfo_get_uarch(uarch_index); (cpuinfo_uarchs,init->fill uarchs in it)-> cpuinfo_uarch_info->uarch ;(Type of CPU microarchitecture )

Cpuinfo:
sibling_package_leader_id ->package_leader_id -> processors[sibling].package_leader_id

cpuinfo_arm_linux_init->cpuinfo_linux_detect_core_siblings->cluster_siblings_parser

static bool cluster_siblings_parser(
	uint32_t processor, uint32_t siblings_start, uint32_t siblings_end,
	struct cpuinfo_arm_linux_processor* processors)
{
	processors[processor].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
	uint32_t package_leader_id = processors[processor].package_leader_id;

	for (uint32_t sibling = siblings_start; sibling < siblings_end; sibling++) {
		if (!bitmask_all(processors[sibling].flags, CPUINFO_LINUX_FLAG_VALID)) {
			continue;
		}
		const uint32_t sibling_package_leader_id = processors[sibling].package_leader_id;
		if (sibling_package_leader_id < package_leader_id) {
			// package_leader_id = sibling_package_leader_id;
		}
		// processors[sibling].package_leader_id = package_leader_id;
		processors[sibling].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;

	}

	processors[processor].package_leader_id = package_leader_id;

	return true;
}