KhronosGroup/SPIR

[[cl::unroll_hint]] and [[cl::ivdep]] hints are not passed to SPIR-V

Opened this issue · 4 comments

If I recall correctly, it is not specified in any document if OpenCL C++-to-SPIR-V compiler should always pass [[cl::unroll_hint]] and [[cl::ivdep]] (ignore vector dependencies) hints to SPIR-V, or if OpenCL C++-to-SPIR-V compiler can decide not unroll the loop and ignore unroll hint.

However, in my opinion, since SPIR-V is an intermediate language between human-readable OpenCL (and other languages) and hardware-specific byte code, OpenCL C++-to-SPIR-V compiler should compile loops with [[cl::unroll_hint]] and [[cl::ivdep]] attributes. That is, it should compile those loops to structured loops (see StructuredControlFlow) with OpLoopMerge instruction with information about the hints, so that later SPIR-V-to-hardware-specific-byte-code compiler can make a decision whether to unroll or vectorize the loop.

Currently, [[cl::unroll_hint]] and [[cl::ivdep]] hints are ignored and are not passed SPIR-V.

Have you tried to output LLVM instead of SPIR-V to check if this is Clang compiler issue or LLVM to SPIR-V converter (https://github.com/KhronosGroup/SPIRV-LLVM/tree/khronos/spirv-3.6.1)?

No I haven't, thanks for the idea. I'll check it today.

I took an action item in today's Khronos call to propose spec text to clarify this behavior. Here's what I have so far:

=== Attribute Qualifiers

The +[[ ]]+ attribute qualifier syntax allows additional attributes to be attached to types, variables, kernel functions, kernel parameters, or loops.

While some attributes are required for program correctness, other attributes are hints and may be ignored by frontend compilers compiling OpenCL {cpp} to an intermediate representation, or by device compilers compiling to device code. Frontend compilers that compile to an intermediate representation are encouraged (but not required) to faithfully pass attribute hints with an intermediate representation to device compilers for further processing.

I think this is OK but I'm not particularly happy about it. Among other things:

  • It differentiates between a "frontend compiler" and a "device compiler", which isn't described anywhere else in the spec.
  • It talks about an "intermediate representation", which also isn't described anywhere else in the spec, except perhaps in the specialization constants section.
  • It offers suggestions about what a compiler should do vs. what a compiler must do, which arguably doesn't belong in a spec at all.

So, I'm open to suggestions for improvement (@bsochack?), or perhaps validation that the text above is good enough, in which case I'll open a merge request with this addition.

Thanks!

It seems clang generates loop metadata for cl::unroll_hint and cl::ivdep in LLVM IR. e.g.

#include <opencl_memory>

using namespace cl;

kernel void worker(global* a, global* b)
{
[[ cl::unroll_hint(2) ]] [[ cl::ivdep ]]
for (uint i=0; i<16; ++i)
a[i] = b[i];
}

The IR is like

br i1 %5, label %6, label %18, !llvm.loop !6

!6 = distinct !{!6, !7, !8}
!7 = !{!"opencl_ivdep"}
!8 = !{!"llvm.loop.unroll.count", i32 2}

However, LLVM/SPIRV converter currently is unable to recover the loop structure and associated loop info from LLVM IR.