Q: Custom BVH vs AccelerationStructure?
ib00 opened this issue · 4 comments
I see that you removed your own BVH building and traversal
and opted for built-in Vulkan AccelerationStructure BVH.
Just out of curiosity, is the Vulkan AS giving better performance than your
compressed wide BVH traversal that you had before? If so, what was the difference?
Hello! Contrary to what one might expect, the switch to a Vulkan AS was actually made in an effort to reduce shader and pipeline compilation times.
In the compute-CWBVH based approach, generated shaders could have 100k+ lines of code and subsequentially, the compute pipeline creation in the driver would take not seconds, but 2 minutes or more.
By switching to an RT pipeline, I can use the shader binding table and callable shaders in order to compile shaders independently from each other and break up the complexity of the pipeline. However, I'm not there yet. I currently only have transitioned the acceleration structure and use it with Ray Queries inside a CS.
I did some ad-hoc testing and observed a performance gain on the RTX 2060 Ti, but it was not large (e.g. 16ms->14ms and 33->31ms). The speedup is probably much larger for an RTX 4090 and future generations.
Thanks for your explanation.
It's interesting that your BVH traversal was competitive. I presume that the main
reason for slight advantage is that Vulkan is using RTX acceleration units
while your compute shaders were not able to due to limitation in GLSL.
So, slow compilation times are result of a megakernel design? Namely,
would the problem be alleviated if you chose a wavefront design?
This also means that MaterialX would have to be done in a different
manner.
I presume it was competitive because the RTX 2060 is a bottom-of-the-line RTX GPU and there's latency hiding going on due to material evaluation (in this case UsdPreviewSurface shading code generated by MaterialX).
Yes, high compilation times are a byproduct of the megakernel design, and yes, I suppose a wavefront design would be the only other solution to this problem. However, one would have to submit one shader dispatch per material per iteration, which could result in reduced occupancy (persistent threading requires a megakernel again). I believe that compilation times due to increasing shading complexity, and the recent introduction of Shader Execution Reordering to ray tracing pipelines are going to make wavefront designs increasingly obsolete.
When it comes to MaterialX, both OSL and MDL code generation backends are agnostic to the renderer architecture. It's up to you to stitch the shaders together.
Thanks!
This reordering looks cool. It'll be interesting to see if it simplifies design or it
further complicates it. GPU path tracers code are already hard to read
and implementing more complex algorithms is pretty ugly.