NVIDIAGameWorks/RayTracingDenoiser

Importance sampling cuts performance by 50% in Bistro test #4

BattleAxeVR opened this issue · 3 comments

Hi again, I'm trying to figure out if there's a bug in the importance sampling, or if it's just expensive to compute in dynamic scenes with moving emissive triangles.

When I go to the fourth test scenario in Bistro (makes a bunch of moving boxes), it's significantly slower than other scenes, and the importance sampling checkbox glows red, drawing attention to itself as being the bottleneck for perf. (nice!)

Out of curiosity, I took a couple GPU captures using NSight graphics (because CPU % doesn't change much so I figure it's all done on GPU in the shaders, I haven't analyzed all the code yet), to compare the timings with importance sampling on vs off.

image

As you can see, the main difference seems to be the raytracing debug marker jumps from 9.26ms to 27.8ms when importance sampling is enabled, which makes sense with a bunch of moving glowy boxes.

But, I'm confused how the Nsight capture bar for "Raytracing" isn't three times longer, and more than that, I'm trying to understand why the performance is cut in half. And more importantly, what can be done to mitigate that. Cause I need a bunch of moving lights in my game. Is that where RTXGI comes in?

It looks like the TLAS update is very fast for both the static scene and the dynamic box test scene, and there are no BLAS updates in the captures here, so all the meshes are static and remain in memory I guess.

Can anyone explain why importance sampling triples the cost of the raytracing? It looks darker without it enabled, but still very nice, so I wonder if simply tone mapping it could do if you need the extra performance. Also, if the importance sampling can be cut into sub-parts, so that individual aspects are toggle-able, that might make it still bright enough but not so slow. (like just cosine / hemispherical sampling rather than iterating over all the meshes with emissive components and checking their visibility to the current shading point, which is what I assume is happening here).

The capture doesn't really make much sense, since toggling it goes from ~60FPS (on) to 120FPS (off), but if the ray tracing part alone takes 27.8ms with importance sampling, it seems like it should run at more like 30 FPS. This is with Vsync off on a 60 Hz monitor to avoid back pressure from the swap chain.

Thanks for your time and any insights here.

Oh yeah, I just wanted to get a better feeling for why the GPU occupancy drops when there is a heavier load, is it from texture stalls or some other contention during importance sampling, that causes the RT core frequency or % load to drop? Based on the captures, if anything, the ray tracing cores should increase in GPU occupancy, not the other way around, when given higher load. It seems backwards to me. I don't get it.

Search for "USE_IMPORTANCE_SAMPLING" in 09_Raytracing.hlsl, look through the code and the perf drop will be demystified. IS for emissive surfaces requires additional ray casting, performance depends on TLAS complexity. Please, close. It's not an issue.

Can anyone explain why importance sampling triples the cost of the raytracing?

Up to 16 additional rays get cast with FLAGS_ONLY_EMISSION flag to find first (possibly occluded) hit with an emissive surface.

Cause I need a bunch of moving lights in my game. Is that where RTXGI comes in?

Many possibilities:

  • RTXDI (RTX Direct Illumination)
  • IS from NRD sample (but I suggest creating a separate TLAS for emissive objects only)
  • your own IS
  • if lighting can be computed analytically and only shadows are needed... it's a different question
  • RTXGI has not been designed for fast moving lights and direct illumination

...so I wonder if simply tone mapping it could do if you need the extra performance

No. Such solution will only emphasize noise.

Also, if the importance sampling can be cut into sub-parts, so that individual aspects are toggle-able, that might make it still bright enough but not so slow...

IS in NRD sample is a free bonus from my side. Feel free to modify & improve. I'm going to improve it, but right now it's not even a moderate priority task. I would start with creating a separate BLAS / TLAS for emissive triangles only. Will do. No ETA yet

Thanks for the suggestions. Closing.