NVIDIAGameWorks/RTX-Path-Tracing

Vulkan SER support..

oscarbg opened this issue · 18 comments

Hi,
noticed release notes mention WIP, so opened this to be notified when it’s ready.. :-)

vk_mini_samples ser_pathtrace works (EDIT: no it doesn't, it only appears to work), but I tried it in this repo and VK_NV_RAY_TRACING_INVOCATION_REORDER_EXTENSION_NAME isn't available.

I'm guessing it just isn't implemented or the extension requested.

https://github.com/nvpro-samples/vk_mini_samples/tree/main/samples/ser_pathtrace

This repo uses GLSL as its shader language. I wonder if HLSL via DXC or Slang is the culprit for this repo not supporting SER

@natevm have you heard of any issues relating to SER not working on Vulkan + HLSL projects? Or is that not the problem here. I put a breakpoint in the init here and that extension isn't enumerated.

image

It's showing the Use SER toggle in the menu but toggling it makes 0 difference in FPS in ser_pathtrace, whereas in the Path Tracing SDK / DX12 it does, it jumps from 50 to 90 FPS or so on my 4090.

Ok, progress, it was missing from the requested enums, that's why it didn't show up in the listed / enumerated ones:

image

image

Ok, now it shows up as supported in this repo too.

image

image

However, toggling SER in Vulkan mode makes no difference to the framerate whatsoever.

image

image

DX12 jumps from 13.6 to 21.8 FPS

image

image

Looks like there's another show-stopper problem, potentially:

image

I enabled USE_HIT_OBJECT_EXTENSION define for SPIRV / Vulkan builds, but it crashes during pipeline creation

image

Looks like SER extensions don't work in HLSL, per the comment. Sigh.

@natevm @csyonghe is there a slang solution here?

I can't switch my entire game's shaders back to GLSL just to use SER, but SER is 100% mandatory for its perf benefits (literally doubles the framerate).

@BattleAxeVR have you tried using Slang? I’ve been using the hit object API in slang for a while now without any major hitches.

https://github.com/shader-slang/slang/blob/master/docs/shader-execution-reordering.md

iiuc, Vulkan support is still a bit rough driver wise, but hopefully will improve soon.

That's exactly what I was hoping would work! Yay, so I don't have to abandon HLSL. Still, it would be nice if this repo used SLANG for VK / SPV builds so SER would actually work and I could A/B compare the perf vs DX12 in the same sample app.

If no one thinks that's worth doing, we can close this as "will not fix" but, IMO, it would be best, for completeness sake, to avoid gotchas like that. I wonder if, like last time, there isn't some decorator solution for DXC / SPV to use hit objects in HLSL too.

I would very much like to know if I can use my current HLSL shader, with some slight modifications for SER, and slangd to compile it, to activate SER on Vulkan. If I have to make significant changes to use slang directly, in raygen which is literally the most complex shader of my engine, that's not a great solution for me. Requires not only a lot of work upfront potentially, for what should only be a few lines of code to activate the extension, but also, and this is the worst part, losing the ability to mix and match HLSL shader source, or library functions, while experimenting with new RTX samples, or integrating things like ReStir, etc. All of which use HLSL. But I profess ignorance about slang native, I understand its syntax is based on HLSL but honestly there isn't a heck of a lot of code samples for it, and that adds to my development burden.

vk_mini_samples ser_pathtrace works (EDIT: no it doesn't, it only appears to work), but I tried it in this repo and VK_NV_RAY_TRACING_INVOCATION_REORDER_EXTENSION_NAME isn't available.

Note that vk_mini_samples/samples/ser_pathtrace has been meanwhile fixed and works as expected.

Yep, saw that, thanks. As we see here, that suffices to prove that SER works on Vulkan, but only through GLSL, whereas HLSL must use slang to compile it + some special code (possibly) to make the hit extensions work. This is the part that I would really appreciate help solving here, since Nvidia are the experts in both SER and HLSL -> Slang -> SPV pipelines. I'm fine with switching to slang compiler for my raygen shader instead of DXC.exe but I'd like to see that actually proven here, such that DX12 and Vulkan give the same performance w/ SER enabled. DX12 is way ahead and it's unrealistic / unreasonable for developers to be limited to GLSL for such a complicated shader (raygen) which is much better implemented with the object oriented approach in HLSL or slang than GLSL. I already spent a ton of time/energy switching from GLSL to HLSL pipeline originally, there is no way I'm going back now! :) Not only that but I'm using float3 / float4 / float4x4 and so on structs throughout my C++ engine so that code and utility functions can be reused (and unit tested) on C/C++ side at the same time as in the shaders, which helps iteration times.

Hi @BattleAxeVR sorry for the late response.

I completely understand your frustration - I'll start poking people a bit more about it; I do all the core coding on the project with HLSL/DX12 and we have someone else contributing all Vulkan upgrades, so I'm a bit out of the loop. But I'll go and try to understand what's happening with HLSL->SPV path for SER!

Not only that but I'm using float3 / float4 / float4x4 and so on structs throughout my C++ engine so that code and utility functions can be reused (and unit tested) on C/C++ side at the same time as in the shaders, which helps iteration times.

That's cool, just out of curiosity, what do you use on the C++ side for supporting HLSL-style math? :)

Thanks!

I'm using Nvidia's own Mathlib which is SIMD-based and pretty fast I guess. It's mostly for all the handy .xxy etc accessors that sometimes are useful.

image

But mainly, I like being able to write unit tests and run them in C++ without booting up shaders. I like to crawl before I walk, or run. TDD has saved me a bunch of debugging time, and I generally don't enjoy debugging basic math, or foundational stuff in the shader at runtime when a quick unit test w/ asserts in there will do fine. Saves me so much time...trial and error workflow sucks when you don't have hot shader reloading implemented yet, especially.

The only "gotcha" here is that float3 is actually 128-bit, padded in the SIMD, whereas in HLSL float 3 doesn't waste any space. That tripped me up for a bit but I have some compile guards / ifdefs to manually add 4 bytes of padding on the shader side so it matches the memory use on the C++ side. But I have a special case for float3s that are baked geometry cause the padding would significantly increase memory consumption for vertices, for nothing.

Incidentally, I noticed there is a slang and HLSL shader with some USE_SER comments in the ser_pathtrace project now. All I really need is one example that works to unblock me, but of course it would be nice if all Nvidia's samples supported SER on VK. I don't know how many people use GLSL for complex path tracing projects, but I found it really irritating to lose out on class methods that you get in HLSL. Especially since many / most of NVidia's more advanced path tracing examples are HLSL-based (and all the utility functions for lighting / importance sampling / etc. It's not the raygen shader itself that's complicated, it's all the included utility functions that I definitely don't want to translate into GLSL).

@BattleAxeVR The ser_pathtrace example provides an example of SER with HLSL and VK using DXC. Specifically, it uses spirv intrinsics for HLSL to utilize SER.

Thanks so much, I'll try that immediately

Thanks @jarvism-nv I knew I was missing something here :D Hope it works, @BattleAxeVR let us know!

(off topic below)

But mainly, I like being able to write unit tests and run them in C++ without booting up shaders. I like to crawl before I walk, or run. TDD has saved me a bunch of debugging time, and I generally don't enjoy debugging basic math, or foundational stuff in the shader at runtime when a quick unit test w/ asserts in there will do fine. Saves me so much time...trial and error workflow sucks when you don't have hot shader reloading implemented yet, especially.

That's exactly why I'm finding it super useful - you can't really quickly debug shaders (even with hot shader reload) and it becomes a bit of a nightmare if the code involves something like linked lists or similar!

The only "gotcha" here is that float3 is actually 128-bit, padded in the SIMD, whereas in HLSL float 3 doesn't waste any space. That tripped me up for a bit but I have some compile guards / ifdefs to manually add 4 bytes of padding on the shader side so it matches the memory use on the C++ side. But I have a special case for float3s that are baked geometry cause the padding would significantly increase memory consumption for vertices, for nothing.

Yes this is actually the reason I was curious in the first place! I want a CPU-side lib where memory mapping and structure packing is identical to shader code (but I don't really care about SIMD). That way you can dump more complex data structures and examine them easily on the CPU side. Thanks for the info :)

hope Vulkan SER PT SDK support can be "fixed" and released as a GDC'24 present!..