/HIP-BVH-Construction

This is my experiments with BVH build algorithms on GPU.

Primary LanguageC++MIT LicenseMIT

Steps to Build and Run

  1. git clone the repo
  2. git submodule update --init --recursive
  3. Run .\tools\premake5\win\premake5.exe vs2022
  4. Build directory will be created which will have the solution file.

HIP-BVH-Construction

This repo is implementation of different GPU BVH build methods and optimizations. Following methods are on the roadmap.

  1. LBVH - Two pass method based on the research paper Maximizing Parallelism in the Construction of BVHs,Octrees, and k-d Trees [DONE]
  2. LBVH - Single pass method based on the research paper Fast and Simple Agglomerative LBVH Construction [DONE]
  3. Inplace LBVH - Based on the HIPRT paper.Refer to HIPRT source for more details. [DONE]
  4. Inplace PLOC++ - When nClusters drop below block size we can optimize the PLOC++ further. After this ploc bvh builder takes ~1ms less. Based on the HIPRT paper. Refer to HIPRT source for more details. [DONE]
  5. HPLOC - Based on the paper. [DONE]
  6. Ploc++ - Based on the PLOC++ paper. You can find the kernel is heavily commented to help understand each step more in detail. [DONE]
  7. Collapse LBVH to nwide BVH - Based on the research paper Getting Rid of Packets Efficient SIMD Single-Ray Traversal using Multi-branching BVHs [Cpu/Gpu DONE] The BVH SAH cost for bunny model drops from ~46 to ~22 while for sponza it drops to ~59 from ~131.
  8. Binned SAH Builder - Based on research paper On fast Construction of SAH-based Bounding Volume Hierarchies, by I. Wald [DONE] Currently this is CPU only build, though implemented based on task model so should be easy to port to GPU.
  9. Early Split Clipping - Based on the research paper Early Split Clipping for Bounding Volume Hierarchies Basic idea is to split the AABB of the primitive along the maximum extent midpoint. The problem is leaf node explosion and the criteria to decide to split is user defined parameter. It is hard to find a generic value for this heuristic. Though, I dont see any performance boost in traversal at least for sponza.(Hard to verify my implementation as of now).
  10. If-If/while-while traversal shaders - Based on the research paper Understanding the Efficiency of Ray Traversal on GPUs – Kepler and Fermi Addendum
  11. Restart Trail Traversal : Based on the research paper.
  12. Speculative while/while

Note : You might find function duplication in the kernel files. I did not get time to refactor this much though it helps to understand code as we have all code in one file.

Reference Images

test

test

test

Leaf Node Visits

colorMap

colorMap

Performance Numbers With Two Pass LBVH

Timings for Bunny Box(150K triangles) on RX6800 AMD

-----------------------Perf Times----------------------------

CalculateCentroidExtentsTime :0.247701ms

CalculateMortonCodesTime :0.037101ms

SortingTime : 0.1635ms

BvhBuildTime : 0.693201ms

CollapseTime : 2.9558ms

Bvh Cost : 22.6397

Total Time : 1.1415ms


Timings for Sponza(260K triangles) on RX6800 AMD

-------------------------Perf Times------------------------

CalculateCentroidExtentsTime :0.223ms

CalculateMortonCodesTime :0.082599ms

SortingTime : 0.249ms

BvhBuildTime : 0.929ms

CollapseTime : 3.6383ms

Bvh Cost : 59.4779

Total Time : 1.4836ms


Performance Numbers with Single Pass LBVH

Timings for Bunny Box(150K triangles) on RX6800 AMD

------------------------Perf Times------------------------------

CalculateCentroidExtentsTime :0.2209ms

CalculateMortonCodesTime :0.0465ms

SortingTime : 0.173501ms

BvhBuildTime : 0.4865ms

CollapseTime : 3.3596ms

Bvh Cost : 22.6397

Total Time : 0.927401ms


Timings for Sponza(260K triangles) on RX6800 AMD

-----------------------Perf Times---------------------------

CalculateCentroidExtentsTime :0.224899ms

CalculateMortonCodesTime :0.0853ms

SortingTime : 0.2496ms

BvhBuildTime : 0.428799ms

CollapseTime : 3.316ms

Bvh Cost : 59.4779

Total Time : 0.988598ms


Performance Numbers with HPLOC

Timings for Bunny Box(150K triangles) on RX6800 AMD

-----------------------Perf Times----------------------------

CalculateCentroidExtentsTime :0.2593ms

CalculateMortonCodesTime :0.070599ms

SortingTime : 0.177599ms

BvhBuildTime : 0.5147ms

CollapseTime : 3.3104ms

Bvh Cost : 21.9676

Total Time : 1.0222ms


Timings for Sponza(260K triangles) on RX6800 AMD

-----------------------Perf Times---------------------------

CalculateCentroidExtentsTime :0.279299ms

CalculateMortonCodesTime :0.204699ms

SortingTime : 0.2535ms

BvhBuildTime : 0.6133ms

CollapseTime : 4.0426ms

Bvh Cost : 48.2362

Total Time : 1.3508ms


Performance Numbers with PLOC++

Timings for Bunny Box(150K triangles) on RX6800 AMD

-----------------------Perf Times---------------------------

CalculateCentroidExtentsTime :0.1981ms

CalculateMortonCodesTime :0.0776ms

SortingTime : 0.1944ms

BvhBuildTime : 0.688ms

CollapseTime : 3.2174ms

Bvh Cost : 21.9248

Total Time : 1.1581ms


Timings for Sponza(260K triangles) on RX6800 AMD

-----------------------Perf Times---------------------------

CalculateCentroidExtentsTime :0.2975ms

CalculateMortonCodesTime :0.1004ms

SortingTime : 0.2598ms

BvhBuildTime : 0.983297ms

CollapseTime : 4.3278ms

Bvh Cost : 48.842

Total Time : 1.641ms


Note : Timings are on AMD 6800W Pro