ActiveVisionLab/nerfmm

Training it on Blender data

Closed this issue · 3 comments

is there any script for training this on blender synthetic dataset proposed in NeRF?

Hi @nishant34,

Sorry for the delay. You should be able to train it with tasks/nerfmm/train.py with some modifications. Check issue #7. Note in 360-degree scenes, you have to supply some initial pose estimations to start with, i.e. our network can only refine noisy poses in 360-degree scenes and cannot optimise full poses from scratch.

Best,
Zirui

legel commented

@ziruiw-dev thank you for the clarifications here and in issue #7. Also, great work by you and your team on the paper and the code -- both are marvelous, and important work in the history of neural rendering.

I would say don't despair about requiring initial pose estimates! In fact, if you study e.g. photogrammetry closely, you'll find there they use something like SIFT to extract out low-dimensional features which can assist with an approximate global optimization; then there is a "global bundle adjustment" which is much closer in spirit to your work, a true fine-tuning optimization based on what people really care about (3D photorealism). The same issue pops up with ICP for point cloud alignment: you basically can't expect it to perform well if your initial estimate is not close to global optimum. In fact studies show the closeness of the final optimum for ICP is a smooth function of the initial hypothesis (naturally you can imagine this is data/entropy dependent, though).

In any case, it is helpful to know that Nerf-- can perform with 360 degree scenes so long as the initial pose estimates are close enough. I will be sure to follow your suggestions here and report back if I have any interesting further findings. It wasn't until I was studying your code and looking at the NDC assumptions that I realized that was probably the biggest issue with 360 object capture.

I think basically a future study could look very closely at what I mentioned: how performance of Nerf-- succeeds/degrades in a 360 degree capture scenario, as a function of closeness of initial 6D pose estimate. I imagine the scripts you guys wrote for the BLEFF dataset could easily be reworked to explore e.g. performance for 360 degree capture scenarios given closeness of rotation ranging from [0.01 degrees, 0.1 degrees, 1 degree, 3 degrees, 5 degrees, 10 degrees, 20 degrees] and similar for translation e.g. [0.01% of scale, 0.1%, 1%, 3%, 5%, 10%, 20%]. I work a lot with all kinds of sensors for 6D pose, so I'm really curious to know the answers to this experiment.

Please let me know if you any insights, expectations on how good the initial pose estimate needs to be.

As well, over time, I could probably assist with providing real-world datasets (let's say sometime over the next year) where we've guaranteed 6D pose computations are accurate to these types of thresholds. I mention this last long-term view, because we both know that COLMAP et al. are absolutely not ground truth, far too often, and really, I think the only way to guarantee ground truth (for experimental analysis, at least) is through additional sensors / known constraints like with April Tags.

Hi @legel,

First of all, thanks for opening this discussion! I definitely agree that many methods like BA, ICP and etc. require initial pose estimations and plenty SLAM/SfM packages can be used to initialise NeRF--.

Regarding 360-degree scenes and the NDC
With initial pose estimations, NeRF-- can definitely refine camera parameters (both extrinsics and intrinsics). The NDC is not a problem and you can safely remove NDC in 360-degree scenes. The only things you need to care about are the near/far values, which need to be set according to your initial poses.

Regarding BLEFF
The scripts for the BLEFF can definitely be extended to more complex camera trajectories. But at the same time, getting realistic renderings from these trajectories requires more complex scene modelling, which requires a lot of effort in Blender alone... For example, a complex camera trajectory can easily lead to getting an image with a mostly blank background just because there is nothing to render. That being said, it's certainly a great direction to work on, we just didn't spend time making these synthetic scenes.

Regarding the accuracy of initial poses
I would say results from most modern Visual Odometry/SLAM/SfM packages should be good enough for NeRF-- camera parameter refinement, in both 360-degree scenes and forward-facing scenes. In terms of numbers, I would say it should be safe under a 15-20 degree rotation error. The reason is we optimise camera parameters in so(3) space, which only works well when the rotation error is small (15-20 degrees). This is also the reason why BA and ICP need good initialisations.

Regarding the GT poses for real data
Yes, please! High-quality real images and accurate ground truth will be highly valuable to the entire research community! Extra sensors can definitely be helpful. To me, accurate external sensors, exact hardware synchronisation and high frame rate cameras will be the key to making this kind of dataset.

All the best,
Zirui