heroseh/hcc

Addressing some of the TODO list points

nanokatze opened this issue · 1 comments

Hey, I'm also working on a C to Vulkan SPIR-V compiler which can be found at https://gitlab.com/nanokatze/cg2c/. Here are a bunch of example shaders:
https://gitlab.com/nanokatze/cg2c/-/blob/master/test/hello/fft_shader.c
https://gitlab.com/nanokatze/cg2c/-/blob/master/test/hello/llama_shader.c
https://gitlab.com/nanokatze/cg2c/-/blob/master/test/hello/llama2_shader.c

Yes, it does actually work.

Perhaps you will find it useful as a reference.

My compiler seems to build on stricter assumptions from those yours is built upon. For example, we don't do logical pointers here. Generic addressing 4 life! It outright requires certain features like bufferDeviceAddress and shaderInt8. This would preclude it from being translatable to current HLSL/DXIL as there's no equivalents to these features. Opaque types like images are eschewed as well, instead hidden behind some ordinary data (integer handles), which makes bindless images a prerequisite.

Anyway, to address entries from your TODO list:

goto statement

As you should've already noticed, SPIR-V doesn't allow you to branch to any random block, requiring you to explicitly say loops and ifs. One reason to do this is to maintain convergence information, i.e. so that subgroup ops or "wave intrinsics" if you prefer have sensible behavior in certain constructs. E.g. consider the following GLSL snippet

int invocationsWithSameValue(int x) {
 int y;
 for (;;) {
  if (x == subgroupBroadcastFirst(x)) {
   y = subgroupAdd(1);
   break;
  }
 }
 return y;
}

if you only have control flow graph alone, without explicit merge points like SPIR-V, this isn't distinguishable from

int invocationsWithSameValue(int x) {
 int y;
 for (;;) {
  if (x == subgroupBroadcastFirst(x)) {
   break;
  }
 }
 y = subgroupAdd(1);
 return y;
}

(please draw the CFGs by yourself)

This is to say, goto is possibly implementable with some effort. Loops formed with goto will be varying degrees of bad: unspecified tangles (sets of invocations) for subgroup ops/wave intrinsics, etc. Forward goto (i.e. equivalent of multibreak) would be perfectly fine though.

thin gpu emulator

Use lavapipe

vectors and matrix extensions

These are only important as syntactic conveniences. Don't worry about not making use of OpTypeVector or especially OpTypeMatrix. All drivers in Mesa for example share the compiler which destructs matrix types and ops into corresponding vector ops as SPIR-V is being ingested, and then vector ops are exploded into scalar ops, because working with vectors is kinda bad.

For load/stores in particular, vector types can help but you could also likely achieve an equivalent effect by specifying stricter alignment for your loads and stores, and loading and storing the compound types at once.

Hey sorry for the late reply, thank you for doing this whole break down.

it's interesting to see that vulkan and metal basically support pointers but DX12 does not. I actually have support for bufferDeviceAddress in my compiler but have it turned off. Although i want to mostly target "the future of graphics programming" i still want to make sure the compiler can be used today. I hope that HLSL/DX12 will just get proper pointer support soon (at least at the IR level) so we can have pointers and unions.

I ended up doing the same thing with bindless as default. I thought maybe we can have a mix of the binding model but it was just better to go all in. It's really nice, from CPU you just pass an int up to the GPU and on the GPU you can pass it around freely too into buffers and shader stages etc. i do have my own HCC_AML_OP_RESOURCE_DESCRIPTOR_LOAD so if i write an optimizer later i can reduce the loading of the descriptor down to fewer calls

ah yes the goto statement. honestly the whole SPIR-V structure control flow down at the IR level is quite annoying. but yes i don't think it's worth the effort to implement goto, maybe for fun later.

ah thanks i'll look into lavapipe

yes i kept OpTypeVector mostly because then i can let the driver implementation do want they want with the extra information. OpTypeMatrix was dropped because implementing one in C that could work CPU & GPU side without compiler extensions was proving to be a pain. So i just made it fully in user space