bounds and downsampling factor for load_llff_data_multi_view
andrewsonga opened this issue · 5 comments
First of all, thank you for releasing your impactful work!
I'm trying to train NRNeRF on multi-view data from 8 synchronized cameras with known intrinsics and extrinsics, and I ran into a couple questions regarding the bounds and the downsampling factor.
1. Are the parameters min_bound
and max_bound
defined as the minimum and maximum across all cameras?
I noticed that in the README.md, there is a single min_bound
and max_bound
that is shared between all cameras when specifying calibration.json
, as opposed to there being one for each camera.
2. When using load_llff_data_multi_view
, if our training images are downsampled from their original resolution by a certain factor, are there any parts of the calibration.json
(i.e. camera intrinsics / extrinsics) we have to accordingly adjust to account for the downsampling factor?
I'm asking this question because that downsampling images by a factor
is not implemented in load_llff_data_multi_view
, but load_llff_data
appears to be using factor
in a couple of cases (https://github.com/yenchenlin/nerf-pytorch/blob/a15fd7cb363e93f933012fd1f1ad5395302f63a4/load_llff.py#L76, https://github.com/yenchenlin/nerf-pytorch/blob/a15fd7cb363e93f933012fd1f1ad5395302f63a4/load_llff.py#L103).
Thank you in advance for reading this long question.
I look forward to reading your response.
Thank you for the swift response!
I have just a few more follow-up questions:
1. Do we have to adjust min_bound
and max_bound
according to the downsampling factor?
2. Do you think using min_bound
s and max_bound
s in the poses_bounds.npy
file generated by running colmap as follows (https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses) constitutes a good heuristic for multi-view?
- I ran colmap on multi-view images from a single timestep to estimate the 3D points, and used the 1% and 99% percentile depth values to define the
min_bound
s andmax_bound
s for each camera; the sharedmin_bound
andmax_bound
then would become the minimum and maximum, respectively, across all caermas.
3. Where is min_bound
and max_bound
used? is it used as the integration bounds for volume rendering?
4. If so, what is the harm of heuristically setting min_bound
as 0 and max_bound
as a very large number?
Thank you for the detailed response! I heeded your instructions carefully, but my renderings are coming out super weirdly and I can't seem to figure out why. The following are the first five renderings for --camera_path spiral:
The first frame of my multi-view video look like this:
Are there any modifications I need to make to free_viewpoint_rendering.py
in order to make it work for multi-view datasets? For instance, do we have to change load_llff_data
to load_llff_data_multi_view
in free_viewpoint_rendering.py
as well as train.py
?
I have never tried running the multi-view code with rendering. The spiral code might be too sensitive, you could try the static or input reconstruction rendering. Changing to load_llff_data_multi_view sounds reasonable, but again, I have not tried that part.