AIS-Bonn/stillleben

Question: How to get the render result with all the objects of the scene?

WIPDarioDT opened this issue · 2 comments

Hi,
I recently started to use this software but I am stuck.
I am following the example viewer.py provided and adding just a few lines to get the tensor with the XYZ coordinates:
renderer = sl.RenderPass() result = renderer.render(scene) coords = result.coordinates()

After that, I transform the coords tensor to point cloud format in order to visualize it with Open3D. This function I made has the following appearance:
def get_cloud(array): cloud = [] for x in range(array.shape[0]): for y in range(array.shape[1]): cloud.append(array[x,y,:].numpy()) return np.array(cloud)

However, when I show it only one of the many objects that were in the scene is shown:
image
It looks like the render result only returns one of the objects that I had added to the scene. Also, when I get the tensor returned from result.coordinates() I noticed that there are a bunch of elements with the value "3000, 3000, 3000" which I have to delete in order to visualize it properly. I don't know where does that come from, is it some kind of initialized values but not filled?

I have read in the documentation of the render method in the RenderPass class that there is a parameter called predicate which is "A function that decides whether a particular object should be rendered". I suppose I have to use this but i can't figure out how. Any help with this?

Thanks in advance.

xqms commented

Hey, thanks for your interest!

result.coordinates() are object-centric coordinates. That means that for each pixel the array answers the question "Which point is this on the object in the coordinate frame of the object?". If you visualize the result as a point cloud, you will get exactly what you see, because each of the bunnies is the same => all points of the bunnies overlap.

What you want is result.cam_coordinates(): https://ais-bonn.github.io/stillleben/stillleben.RenderPassResult.html#cam_coordinates. This tensor contains for each pixel the 3D position in camera space.

(The 3000,3000,3000 value is used in the coordinates tensor to indicate that there is no object at this particular pixel).

It worked as you said. Thank you so much!