Project Description

The goal of this project is to implement the 'SparseLeap-an effective empty space skipping technique' during the volume rendering phase of the NeRF(Neural Radiance Fields). In this project, we have used a Lego data set for training the NeRF. The NeRF is first trained using Stratified and Hierarchical volume sampling. once the training is completed, Sparseleap is used to render the novel view from the trained NeRF. Stratified and Hierarchical volume sampling sampled a point uniformly along the ray irrespective of whether the sampled point is in empty space or not. Sparseleap strategically skip the empty space and sampled a point only where objects lie. Hence, Sparseleap improves the computational efficiency of the volume rendering phase in NeRF.

What is NeRF (Neural Radiance Fields)?

NeRF or better known as Neural Radiance Fields is a state-of-the-art method that generates novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. The input can be provided as a blender model or a static set of images. The input is provided as a continuous 5D function that outputs the radiance emitted in each direction (θ; Φ) at each point (x; y; z) in space, and a density at each point which acts like a differential opacity controlling how much radiance is accumulated by a ray passing through (x; y; z).

img2 The figure above represents the steps that optimizes a continuous 5D (x; y; z; θ; Φ) neural radiance field representation (volume density and view-dependent color at any continuous location) of a scene from a set of input images. Here in this example of drum set, approximately 100 images were given as input with various x, y, z, θ, and Φ values for each of them.

A continuous scene can be described as a 5D vector-valued function whose input is a 3D location x = (x; y; z) and 2D viewing direction (θ; Φ), and whose output is an emitted color c = (r; g; b) and volume density (α). To generate a Neural Radiance Field from a particular viewpoint following steps were done:

  1. March camera rays through the scene to generate a sampled set of 3D points
  2. Use those points and their corresponding 2D viewing directions as input to the neural network to produce an output set of colors and densities
  3. Use classical volume rendering techniques to accumulate those colors and densities into a 2D image

img1 (a) Sampling 5D coordinates (location and viewing direction) along camera rays; (b) feeding those locations into an MLP to produce a color and volume density; c)using volume rendering techniques to composite these values into an image; (d) optimize the scene representation by minimizing the residual between synthesized and ground truth observed images

The Optimization here is happening for a deep fully connected multi-layer perceptron without using any convolutional layers. Also, gradient descent is used to optimize this model by minimizing the error between each observed image and the corresponding views rendered from the representation.

What is Sparseleap?

In volume rendering, a significant part of the computational effort goes into computing a large number of samples from the underlying scalar field. Therefore, one of the most important basic performance optimizations is trying to avoid sampling empty space, i.e., regions where the samples do not contribute to the volume rendering integral. This process is usually called empty space skipping or space leaping.

SparseLeap is a novel method for efficient empty space skipping in volume rendering. It comprises several key components: First, an occupancy histogram tree that hierarchically tracks three occupancy classes for volume regions: empty, non-empty, and unknown. Second, a traversal algorithm for the occupancy histogram tree that extracts view-independent occupancy geometry of nested bounding boxes only where the occupancy class changes. This significantly reduces the fragmentation of space. Third, a ray segment lists, which are per-pixel linked lists of consecutive segments of differing occupancy class. These lists are generated by rasterizing the occupancy geometry, while merging successive segments of the same class, such as several consecutive empty segments, into a single segment. Finally, empty space skipping during ray-casting is now a simple linear list traversal that skips “as- long-as-possible” empty ray segments without hierarchy traversal.

62034

SparseLeap algorithm overview. (a) The occupancy histogram tree stores hierarchical volume occupancy information, using the classes empty, non-empty, and unknown. (b) Traversal of the occupancy histogram tree creates occupancy geometry whenever nested regions differ in occupancy class. The occupancy geometry can be re-used for multiple frames. (c) The occupancy geometry is rasterized into ray segment lists, merging successive segments of the same class. (d) Ray-casting leaps over empty space via linear traversal of the ray segment list of each ray.

Experiments and Results

For experiments, NeRF is trained on the Lego dataset for10,000 iterations. During the training process, a stratified and hierarchical sampling method is used in the volume rendering step. Sparseleap is then trained using the trained NeRF. Once the sparseleap is trained, we compared the performance of the sparseleap with the stratified and hierarchical volume rendering. To compare the performance, we have rendered the images using both methods. Below figure shows the ground truth images, images rendered using the sparseleap, and images rendered using the hierarchical volume sampling method. We have rendered 4 images for sparseleap and hierarchical sampling with number of ray per sample equal to 32, 64, 96 and 128. The quality of the images are compared using PSNR (Peak Signal to Noise Ration) and MSE (Mean Square Error) criteria.

snap1

As shown in the above figure, the quality of the image rendered using the sparseleap is better than the quality of the image rendered by hierarchical sampling. For example, if we look at the PSNR number of the images for N = 32, the value of the PSNR for the sparseleap is 22.23 and the value of the PSNR for the hierarchical sampling is 20.60. Hence the quality of the sparseleap image is better hierarchical volume sampling image. As shown in the above figure, the PSNR number of the image rendered using sparsleap with N = 96 is almost comparable with the PSNR number of the image rendered using the hierarchical sampling with N = 128. Hence the quality of both images is equal. Therefore we can say that in this example, for sparsleap we roughly need 32 fewer samples per ray than hierarchical sampling to render the image of the same quality. The size of the image is 100x100. Hence we can say that sparseleap approximately make 320000 fewer forward pass to the NeRF to render the image of the equal quality with hierarchical sampling.

Below figures show the graph for PSNR vs the number of sample per ray for two different training sample. It corroborates the explanation provided earlier.

1718

For this scene using sparseleap we have skipped 31.96% of the empty space. It is also worth noting that the better the output of the NeRf, the better sparseleap is at skipping empty space as NeRF can provide accurate volume density information.

Requirement

Python 2.0 or above

License

License: MIT

Copyright (c) Feb 2023 Pradip Kathiriya