mitsuba-renderer/drjit

"First steps in Python" tutorial in the docs renders incorrect image with CUDA (but works when switching to LLVM)

ShnitzelKiller opened this issue · 7 comments

Copying the final optimized tutorial code (complete all the optimizations and use of autograd to compute the normals) from the tutorial here gives the following result:
image

Here is the code for reference.

import drjit as dr
from drjit.cuda.ad import Float, UInt32, Array3f, Array2f, TensorXf, Texture3f, PCG32, Loop

dr.set_log_level(dr.LogLevel.Info)

noise = PCG32(size=16*16*16).next_float32()
noise_tex = Texture3f(TensorXf(noise, shape=(16, 16, 16, 1)))

def sdf(p: Array3f) -> Float:
    sdf_value = dr.norm(p) - 1
    sdf_value += noise_tex.eval_cubic(dr.fma(p, 0.5,  0.5))[0] * 0.1
    return sdf_value

def trace(o: Array3f, d: Array3f) -> Array3f:
    i = UInt32(0)
    loop = Loop("Sphere tracing", lambda: (o, i))
    while loop(i < 10):
        o = dr.fma(d, sdf(o), o)
        i += 1
    return o

def shade(p: Array3f, l: Array3f, eps: float = 1e-3) -> Float:
    dr.enable_grad(p)
    value = sdf(p);
    dr.set_grad(p, l)
    dr.forward_to(value)
    return dr.maximum(0, dr.grad(value))

x = dr.linspace(Float, -1, 1, 1000)
x, y = dr.meshgrid(x, x)

p = trace(o=Array3f(0, 0, -2),
          d=dr.normalize(Array3f(x, y, 1)))

sh = shade(p, l=Array3f(0, -1, -1))
sh[sdf(p) > .1] = 0

img = Array3f(.1, .1, .2) + Array3f(.4, .4, .2) * sh
img_flat = dr.ravel(img)

img_t = TensorXf(img_flat, shape=(1000, 1000, 3))

import matplotlib.pyplot as plt
plt.imshow(img_t)
plt.show()

Switching the backend from CUDA to LLVM by changing the line

from drjit.cuda.ad import Float, UInt32, Array3f, Array2f, TensorXf, Texture3f, PCG32, Loop

to

from drjit.llvm.ad import Float, UInt32, Array3f, Array2f, TensorXf, Texture3f, PCG32, Loop

fixes the image to render correctly:
image

Does anybody have insights into why this does not work in CUDA? It appears that the texture's contribution to the SDF is ignored in the gradient computation, resulting in the sphere normals being unchanged.

Hi @ShnitzelKiller

I don't think we've ever seen something similar. Could you tell us what GPU model you're using and the driver version?

I have tried this on both a Windows machine with an RTX 2060 Super and a CentOS server with a TITAN XP with the same results. On Windows, the NVIDIA driver version is 528.49, and on Linux, it is 515.65.01.
In both environments, I have installed drjit 0.4.1 using pip with Python 3.11.0.

If you change the Texture3f construction as such:

noise_tex = Texture3f(TensorXf(noise, shape=(16, 16, 16, 1)), migrate=False)

Does it work ?

Yes, that appears to fix the problem on both computers. Why is that? I can't find documentation of the Texture3f object/constructor.

This has not made it's way into the Python documentation yet.

You can still read about it in the c++ source code: https://github.com/mitsuba-renderer/drjit/blob/master/include/drjit/texture.h#L97-L103

In short, the Texture class makes use of hardware acceleration on GPUs, which requires a specific type of memory allocation. During the Texture construction, the texture data (in the tutorial: the TensorXf object) must be copied to this Texture-specific memory model. With this copy, the raw data is effectively duplicated in memory. When set to True, the migrate guarantees that we only keep the texture specific memory. Finally, only the eval method actually makes use of the hardware acceleration and the texture-specific memory, and not eval_cubic.
As a safeguard, if the data was fully migrated to the texture-area and you call eval_cubic you will get 0s in return.

When I switch from eval_cubic to eval or eval_cuda, with migrate set to True, I still get an image that looks like this:
image

Are you saying that eval_cubic does not use the hardware texture sampler for its 8 texture lookups?

The eval method will use liner interpolation whereas the eval_cubic method will use cubic. Hence the difference appeareance, which is also different from your original image.

I forget the exact details, but I think it also has to do with the fact that we have to re-attach the gradients:
If we use some hardware feature to compute something for us, we obviously lose the gradient tracking for those operations. So, if we see that we need the gradients on the texture, as it's the case in the tutorial, we need to replicate what the hardware is doing just to add the proper computations to the AD graph.