KhronosGroup/SPIRV-Registry

SPV_KHR_cooperative_matrix uniformity requirements

BNieuwenhuizen opened this issue · 2 comments

Many of the instructions for cooperative matrices have something in the form of

For a given dynamic instance of this instruction, all invocations in a given scope instance must be active or all must be inactive (where the scope is the scope of the operation).

  1. What are all invocations in the scope of an operation? In particular , if we have a workgroup size < subgroup size or such, even in uniform control flow, all invocations that were launched may be active, but not a full subgroup. Is that allowed?

  2. Is an implementation allowed to store things in "internal invocations" which are not invocations from the API perspective if the HW requires full subgroups and if so, how is an application supposed to do scalar operations on this extra data?

  3. Follow up on (1) but is this behavior allowed on non-compute stages?

  4. What is considered "all invocations" for queue/device scopes, especially when considering multiple draw/dispatch commands?

The group discussed these today. Replying in reverse order:

(4) AFAIK all current implementations only support Subgroup scope. It's mostly easy to extrapolate how things would work for Workgroup scope, but the spec doesn't talk about larger scopes than that, so we'll make a spec change to forbid larger scopes.

(3) Similarly, the spec doesn't talk about how you would form groups of cooperating threads for non-compute stages. This could maybe work naturally for task/mesh, but I don't think anybody supports it there currently. So we'll also clarify in the spec that this is currently only supported for compute.

(2) No, implementations can only use real invocations.

(1) For subgroup scope, the intent is that the application must launch full subgroups. For workgroup scope, I don't think the spec actually offers a way for the implementation to describe what size workgroup to launch. Not sure whether this means we need to forbid workgroup scope, or leave it as "ask the hardware vendor what to use". (Ironically, the NV extension had this same problem for subgroup scope - you had to know to launch a workgroup with a multiple of 32 invocations).

kpet commented

Discussed in the 2023/12/13 SPIR-V teleconference: all the clarifications needed in the specifications should now have been made. Closing this issue. Feel free to reopen it or create a new one if any further clarifications are required.