Understanding the Role of arrive in NamedBarrier Synchronization
ziyuhuang123 opened this issue · 1 comments
ziyuhuang123 commented
-
In the FA3 store function, I observed the following process:
- Data is stored from registers to shared memory.
- A sync occurs.
- Then, data is stored from shared memory to global memory.
-
This sync is a NamedBarrier sync, but I noticed that no arrive operation is performed:
- I searched the corresponding barrier ID and confirmed that no arrive is associated with it.
-
This reminds me of
__syncthreads
, which translates to PTX asbar.sync
and also doesn’t involve an explicitarrive
. -
This raises the question:
- Does this imply that
arrive
is unnecessary for such synchronization scenarios?
- Does this imply that
-
However, I noticed that in other parts of FA3,
arrive
is used. -
Therefore, I’m curious:
- What are the specific conditions or scenarios where
arrive
is required?
- What are the specific conditions or scenarios where
ziyuhuang123 commented
It seems that if we use it as __syncthreads, there's no need for arrive. bar.arrive is meant for use in WASP, isn't it? Although using a barrier for WASP feels odd... doesn't this forcibly require the producer and consumer to have the same participating threads?