"First steps in Python" tutorial in the docs renders incorrect image with CUDA (but works when switching to LLVM)
ShnitzelKiller opened this issue · 7 comments
Copying the final optimized tutorial code (complete all the optimizations and use of autograd to compute the normals) from the tutorial here gives the following result:
Here is the code for reference.
import drjit as dr
from drjit.cuda.ad import Float, UInt32, Array3f, Array2f, TensorXf, Texture3f, PCG32, Loop
dr.set_log_level(dr.LogLevel.Info)
noise = PCG32(size=16*16*16).next_float32()
noise_tex = Texture3f(TensorXf(noise, shape=(16, 16, 16, 1)))
def sdf(p: Array3f) -> Float:
sdf_value = dr.norm(p) - 1
sdf_value += noise_tex.eval_cubic(dr.fma(p, 0.5, 0.5))[0] * 0.1
return sdf_value
def trace(o: Array3f, d: Array3f) -> Array3f:
i = UInt32(0)
loop = Loop("Sphere tracing", lambda: (o, i))
while loop(i < 10):
o = dr.fma(d, sdf(o), o)
i += 1
return o
def shade(p: Array3f, l: Array3f, eps: float = 1e-3) -> Float:
dr.enable_grad(p)
value = sdf(p);
dr.set_grad(p, l)
dr.forward_to(value)
return dr.maximum(0, dr.grad(value))
x = dr.linspace(Float, -1, 1, 1000)
x, y = dr.meshgrid(x, x)
p = trace(o=Array3f(0, 0, -2),
d=dr.normalize(Array3f(x, y, 1)))
sh = shade(p, l=Array3f(0, -1, -1))
sh[sdf(p) > .1] = 0
img = Array3f(.1, .1, .2) + Array3f(.4, .4, .2) * sh
img_flat = dr.ravel(img)
img_t = TensorXf(img_flat, shape=(1000, 1000, 3))
import matplotlib.pyplot as plt
plt.imshow(img_t)
plt.show()
Switching the backend from CUDA to LLVM by changing the line
from drjit.cuda.ad import Float, UInt32, Array3f, Array2f, TensorXf, Texture3f, PCG32, Loop
to
from drjit.llvm.ad import Float, UInt32, Array3f, Array2f, TensorXf, Texture3f, PCG32, Loop
fixes the image to render correctly:
Does anybody have insights into why this does not work in CUDA? It appears that the texture's contribution to the SDF is ignored in the gradient computation, resulting in the sphere normals being unchanged.
I don't think we've ever seen something similar. Could you tell us what GPU model you're using and the driver version?
I have tried this on both a Windows machine with an RTX 2060 Super and a CentOS server with a TITAN XP with the same results. On Windows, the NVIDIA driver version is 528.49, and on Linux, it is 515.65.01.
In both environments, I have installed drjit 0.4.1 using pip with Python 3.11.0.
If you change the Texture3f
construction as such:
noise_tex = Texture3f(TensorXf(noise, shape=(16, 16, 16, 1)), migrate=False)
Does it work ?
Yes, that appears to fix the problem on both computers. Why is that? I can't find documentation of the Texture3f object/constructor.
This has not made it's way into the Python documentation yet.
You can still read about it in the c++ source code: https://github.com/mitsuba-renderer/drjit/blob/master/include/drjit/texture.h#L97-L103
In short, the Texture
class makes use of hardware acceleration on GPUs, which requires a specific type of memory allocation. During the Texture
construction, the texture data (in the tutorial: the TensorXf
object) must be copied to this Texture-specific memory model. With this copy, the raw data is effectively duplicated in memory. When set to True
, the migrate
guarantees that we only keep the texture specific memory. Finally, only the eval
method actually makes use of the hardware acceleration and the texture-specific memory, and not eval_cubic
.
As a safeguard, if the data was fully migrated to the texture-area and you call eval_cubic
you will get 0s in return.
The eval
method will use liner interpolation whereas the eval_cubic
method will use cubic. Hence the difference appeareance, which is also different from your original image.
I forget the exact details, but I think it also has to do with the fact that we have to re-attach the gradients:
If we use some hardware feature to compute something for us, we obviously lose the gradient tracking for those operations. So, if we see that we need the gradients on the texture, as it's the case in the tutorial, we need to replicate what the hardware is doing just to add the proper computations to the AD graph.