HengyiWang/Co-SLAM

Global rays selection for BA

supremeccccc opened this issue · 3 comments

Hi, thanks for your excellent work. I have some questions after reading the paper. Global BA from all rays is cool and it prevent catastrophic forgetting of decoder. However, since the update of recent frame is reduced. It may harm the mapping and location performance when shifting to a newly founded area. Also, the size of bundle rise( you have to adjust more pose), the optimization speed would be slower. Do you have any insight about that? Thanks.

Hi @supremeccccc, thanks for your interest in our work. That is a really great question!!!

You're absolutely right. The reduction in the number of pixels sampled from the current frame could pose challenges when transitioning to newly discovered areas. A minimum number of pixels sampled from the current frame has been set to help mitigate this issue. Currently, this approach works well on NICE-SLAM apartments with more than 10k frames and 3 rooms.

For the optimisation speed, the extra computational cost can usually be ignored or remains manageable since each pose representation only consists of 6 parameters (axis-angle).

I do believe that the current solution might not be the most elegant, and there is certainly room for further improvement in the global BA. For instance, incorporating consistent samples from the current frame while does not ruin the optimisation of the decoder, adding some constraints to poses so that we can sample rays & update poses more wisely, etc.

Feel free to drop me an email if you want to discuss more about it :)

Hi, @HengyiWang thank for your kind and fast reply. To alleviate forgetting, the H2Mapping propose a sampling strategy which is to sample areas that are not recently visited. I suppose that it is a more suitable way to balance anti-forgetting and exploration.
I have tried to increase the window size of NICE-SLAM while keeping the batch size of rays as same as origin. And i founded that it slower the optimization (i only count the backward time) and the influence is non-negligible. I think i should first have a try of your method.
I have a long time struggling transferring Neural SLAM to outdoor scene and found that it is rather fragile since the pose optimization diverges in areas that are not mapped well which however always accompany with the sparse views of outdoor scene.
Thanks again for the code release. I am willing to share if i have any deeper insight about this problem.

Hi @supremeccccc, thank you for your response. That is a great strategy. However, one issue is that we did not employ the off-the-shelf pose tracker as H2Mapping did and the main architecture of the model is a bit different. We have tried some similar strategies, but they sometimes lead to tracking issues.

Regarding outdoor scenes, incorporating an off-the-shelf tracker to estimate the pose and utilizing neural representations for mapping and pose refinement could be a potential solution (neural Mapper). But if you can make a pure neural SLAM work on outdoor scenes, that would be super cool!

By the way, I'm curious about whether the H2Mapping model supports pose refinement and how its reconstruction quality and run-time performance compares to NICE-SLAM and Vox-fusion when using the same pose estimated by an off-the-shelf pose tracker.