Question about Mip-NeRF 360 pose process.

Question

Question about Mip-NeRF 360 pose process.

Jia-Wei-Liu opened this issue 2 years ago · 2 comments

Thanks for releasing this excellent NeRF factory! The code works very well and I just have some questions about the Mip-NeRF 360 pose process and hope you could help me understand it. Thank you very much!

I want to confirm that are you using OpenCV coordinate to train Mip-NeRF 360 data? I suppose the poses before processing is OpenGL coordinate as follows: https://github.com/kakaobrain/NeRF-Factory/blob/ce06663abe385c5cbe85fddbca8b9b5ace80dbee/src/data/data_util/nerf_360_v2.py#L365
After processing (Q2 and Q3), https://github.com/kakaobrain/NeRF-Factory/blob/ce06663abe385c5cbe85fddbca8b9b5ace80dbee/src/data/data_util/nerf_360_v2.py#L381
The coordinate is transformed to OpenCV coordinate. Is this correct? If so, since many NeRF datasets use OpenGL coordinate, why do you choose to use OpenCV coordinate instead of OpenGL coordinate?
I do not understand the function of transform_pose_llff. It seems to change some values of extrinsic matrix but I do not understand the reason to do so. Could you please help clarify this? https://github.com/kakaobrain/NeRF-Factory/blob/ce06663abe385c5cbe85fddbca8b9b5ace80dbee/src/data/data_util/nerf_360_v2.py#L337
I do not understand the function of similarity_from_cameras. I guess this function has some relation with the transform_pose_llff, but I cannot figure out its effect, as well as the the math to transform from OpenGL to OpenCV coordinate using Q2 and Q3 functions. Could you please help clarify this? https://github.com/kakaobrain/NeRF-Factory/blob/ce06663abe385c5cbe85fddbca8b9b5ace80dbee/src/data/data_util/nerf_360_v2.py#L279

Thank you very much for your help!
Best regards!

Answer 1 · 2022-11-16T07:05:06.000Z

Sorry for being late. Our team members were busy for preparing the CVPR submission.

Yes you are right. We integrated all the coordinates to OpenCV for convention. This was because many NeRF models including Plenoxels, DVGO, and more have a process that maps 3D coordinates to the voxel coordinates. In such models, they use many CUDA operations. To minimize effort rebuilding CUDA operations, we adopt the OpenCV coordinate.
This data loading process follows the implemetation from the authors of NeRF. But, as they used OpenGL coordinate, we need to manually convert their coordinates to OpenCV. Hence, we convert the poses from the coordinate from OpenGL to OpenCV.
This transformation code was first used in Plenoxels. Unlike pre-calibrated camera poses, many camera poses are not aligned so that they can be sometimes be either slant or overturned, which can harms the performance of voxel-based methods.

Answer 2 · 2022-11-16T07:06:41.000Z

Let us know if you have more questions for this. Feel free to reopen the issue.