Confusion on section 3.3

Question

Confusion on section 3.3

Closed this issue 2 years ago · 3 comments

I'm rather confused on this section because you cite P as this function based on multi view geometry and then describe two approximations. Do these approximations represent P? Also I am confused about how to implement these approximations outside of checking inside and outside of the image, specifically how do you "duplicate its features to the 3D point"?

Answer 1 · 2020-12-28T18:14:56.000Z

The function P should be regarded as the projection of a point in 3D space to a place in the 2D image plane, i.e., the place in the image where the ray from the viewpoint to the point in question hits. I recommend these slides for the details of the projection: http://www.cs.toronto.edu/~jepson/csc420/notes/imageProjection.pdf

The issue is that if a point is projected to a place in the image plane where we don't have any information (like outside the image), then we need to do some approximation. Thus, we either concatenate the zero feature-vector or the features of the closest pixel.

By "duplicating features," we mean to concatenate the corresponding local features to the point which was projected to those features. Note that many points will be projected to the same local features in this way. We can then feed that point + features into the MLP for the classification of RGB and volume density.

Does that help?

Answer 2 · 2020-12-29T20:43:40.000Z

For function p, how is a query point selected for a given viewpoint? How many do you use? In the NERF paper they use ray marching, do you use a similar technique? Couldn't find these details in the paper but may have missed them.

If you're willing to share any code, feel free! Thanks.

Answer 3 · 2021-01-12T15:57:01.000Z

Sorry for my late reply. We use the same ray marching strategy as NeRF. We use different sampling rates for the different datasets (depending on the complexity of the geometry, e.g., less samples for shapenet). To sum up, we march rays through the scene going through each pixel of the desired image (like in NeRF), then use a rough model to find the volume density along rays and a fine model to bias towards the dense areas. From there, we classify those points with the general procedure of projecting them to the input plane of all the input images and aggregating. Some of the specifics are detailed in A.1 in the paper.

I will post the code soon!