Regarding Default Behaviour for CPU Affinity

Question

Regarding Default Behaviour for CPU Affinity

mert-kurttutan opened this issue 7 months ago · 4 comments

I have script that runs simple blis sgemm code (allocate vector for a, b, c and run sgemm) on sequantial mode (with OMP_NUM_THREADS=1 with OpenMP multithreading enabled, so it uses 1 thread only).
It gives consistently different runtime performance when the affinity is set to different cpu cores via taskset command in linux.

taskset -c 0 ./blis    # time 1.96 sec
taskset -c 1 ./blis   # time 1.70 sec

I did not change anything in blis regarding affinity, and compiled as instructed in wiki.

If I did not use taskset to set the available cpu core, blis seems to be choosing the faster one (e.g. core 1).

./blis   # time 1.70 sec

I read the wiki, the doccumentation talks about how it can be changed, used to solve affinity-related problems. But, I could not see anything related to the default behaviour on the affinity.

So, what is the default way blis handles the affinity? I just want to see if you have insight without requiring other info from me (e.g. hardware). If you want, I can provide hardware info and other specs

Answer 1 · 2024-04-17T17:56:46.000Z

This could be due to inconsistent thermal diffusion from different parts of the chip. By default, BLIS does not do anything to set the CPU affinity. We are working on implementing this in a future release but for now it must be done externally. From: mert-kurttutan ***@***.***> Date: Wednesday, April 17, 2024 at 11:53 AM To: flame/blis ***@***.***> Cc: Subscribed ***@***.***> Subject: [flame/blis] Regarding Default Behaviour for CPU Affinity (Issue #803) I have script that runs simple blis sgemm code (allocate vector for a, b, c and run sgemm) on sequantial mode (with OMP_NUM_THREADS=1 with OpenMP multithreading enabled, so it uses 1 thread only). It gives consistently different runtime performance when the affinity is set to different cpu cores via taskset command in linux. taskset -c 0 ./blis # time 1.96 sec taskset -c 1 ./blis # time 1.70 sec I did not change anything in blis regarding affinity, and compiled as instructed in wiki. If I did not use taskset to set the available cpu core, blis seems to be choosing the faster one (e.g. core 1). ./blis # time 1.70 sec I read the wiki, the doccumentation talks about how it can be changed, used to solve affinity-related problems. But, I could not see anything related to the default behaviour on the affinity. So, what is the default way blis handles the affinity? I just want to see if you have insight without requiring other info from me (e.g. hardware). If you want, I can provide hardware info and other specs — Reply to this email directly, view it on GitHub<#803>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIAZIKVTWETX3FJKM5WXFDY52SJ5AVCNFSM6AAAAABGLWF5NSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2DQNZWGAYDINI>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2024-04-17T18:39:59.000Z

Thanks, I just checked the cpu clock speed using htop. Each cpu has different frequency and it checks out with the results from the timing of blis.
But, it is still very curious that blis consistently chooses the processor so that the result is the fastest.

When I compiled the blis with OpenMP multithreading, is it possible that affinity is somehow handled by OpenMP, or is it done by C?

Answer 3 · 2024-04-17T18:49:26.000Z

The operating system is making that choice, not BLIS 😊. From: mert-kurttutan ***@***.***> Date: Wednesday, April 17, 2024 at 1:40 PM To: flame/blis ***@***.***> Cc: Matthews, Devin ***@***.***>, Comment ***@***.***> Subject: Re: [flame/blis] Regarding Default Behaviour for CPU Affinity (Issue #803) Thanks, I just checked the cpu clock speed using htop. Each cpu has different frequency and it checks out with the results from the timing of blis. But, it is still very curious that blis consistently chooses the processor so that the result is the fastest. When I compiled the blis with OpenMP multithreading, is it possible that affinity is somehow handled by OpenMP, or is it done by C? — Reply to this email directly, view it on GitHub<#803 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIAZIJ4ERH5YNMCUEWLF3DY526ZJAVCNFSM6AAAAABGLWF5NSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRRHE3TCOBSHA>. You are receiving this because you commented.Message ID: ***@***.***>

Answer 4 · 2024-04-17T18:50:26.000Z

And I guess I should clarify: the default affinity is “no affinity” meaning that operating system can migrate the process to any core. The OS is then probably choosing the core based on thermal limits. From: mert-kurttutan ***@***.***> Date: Wednesday, April 17, 2024 at 1:40 PM To: flame/blis ***@***.***> Cc: Matthews, Devin ***@***.***>, Comment ***@***.***> Subject: Re: [flame/blis] Regarding Default Behaviour for CPU Affinity (Issue #803) Thanks, I just checked the cpu clock speed using htop. Each cpu has different frequency and it checks out with the results from the timing of blis. But, it is still very curious that blis consistently chooses the processor so that the result is the fastest. When I compiled the blis with OpenMP multithreading, is it possible that affinity is somehow handled by OpenMP, or is it done by C? — Reply to this email directly, view it on GitHub<#803 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIAZIJ4ERH5YNMCUEWLF3DY526ZJAVCNFSM6AAAAABGLWF5NSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRRHE3TCOBSHA>. You are receiving this because you commented.Message ID: ***@***.***>