facebookresearch/nonrigid_nerf

bounds and downsampling factor for load_llff_data_multi_view

andrewsonga opened this issue · 5 comments

First of all, thank you for releasing your impactful work!
I'm trying to train NRNeRF on multi-view data from 8 synchronized cameras with known intrinsics and extrinsics, and I ran into a couple questions regarding the bounds and the downsampling factor.

1. Are the parameters min_bound and max_bound defined as the minimum and maximum across all cameras?

I noticed that in the README.md, there is a single min_bound and max_bound that is shared between all cameras when specifying calibration.json, as opposed to there being one for each camera.

2. When using load_llff_data_multi_view, if our training images are downsampled from their original resolution by a certain factor, are there any parts of the calibration.json (i.e. camera intrinsics / extrinsics) we have to accordingly adjust to account for the downsampling factor?

I'm asking this question because that downsampling images by a factor is not implemented in load_llff_data_multi_view, but load_llff_data appears to be using factor in a couple of cases (https://github.com/yenchenlin/nerf-pytorch/blob/a15fd7cb363e93f933012fd1f1ad5395302f63a4/load_llff.py#L76, https://github.com/yenchenlin/nerf-pytorch/blob/a15fd7cb363e93f933012fd1f1ad5395302f63a4/load_llff.py#L103).

Thank you in advance for reading this long question.
I look forward to reading your response.

Thank you for the swift response!
I have just a few more follow-up questions:

1. Do we have to adjust min_bound and max_bound according to the downsampling factor?

2. Do you think using min_bounds and max_bounds in the poses_bounds.npy file generated by running colmap as follows (https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses) constitutes a good heuristic for multi-view?

  • I ran colmap on multi-view images from a single timestep to estimate the 3D points, and used the 1% and 99% percentile depth values to define the min_bounds and max_bounds for each camera; the shared min_bound and max_bound then would become the minimum and maximum, respectively, across all caermas.

3. Where is min_bound and max_bound used? is it used as the integration bounds for volume rendering?

4. If so, what is the harm of heuristically setting min_bound as 0 and max_bound as a very large number?

Thank you for the detailed response! I heeded your instructions carefully, but my renderings are coming out super weirdly and I can't seem to figure out why. The following are the first five renderings for --camera_path spiral:
Screen Shot 2021-11-29 at 6 00 44 AM
Screen Shot 2021-11-29 at 6 00 51 AM
Screen Shot 2021-11-29 at 6 01 04 AM
Screen Shot 2021-11-29 at 6 09 24 AM
Screen Shot 2021-11-29 at 6 09 31 AM

The first frame of my multi-view video look like this:

Screen Shot 2021-11-29 at 6 04 35 AM

Screen Shot 2021-11-29 at 6 04 40 AM

Screen Shot 2021-11-29 at 6 04 46 AM

Screen Shot 2021-11-29 at 6 04 52 AM

Screen Shot 2021-11-29 at 6 05 01 AM

Screen Shot 2021-11-29 at 6 05 08 AM

Screen Shot 2021-11-29 at 6 05 13 AM

Screen Shot 2021-11-29 at 6 05 19 AM

Are there any modifications I need to make to free_viewpoint_rendering.py in order to make it work for multi-view datasets? For instance, do we have to change load_llff_data to load_llff_data_multi_view in free_viewpoint_rendering.py as well as train.py?

I have never tried running the multi-view code with rendering. The spiral code might be too sensitive, you could try the static or input reconstruction rendering. Changing to load_llff_data_multi_view sounds reasonable, but again, I have not tried that part.