Mono-RGB camera Gaussian Splatting

Question

DiamondGlassDrill opened this issue a year ago · 3 comments

Question
Will it be possible to implement non RGB(D) videos/images to generate the Gaussian-Slam?
Question
What are the scan timings, is Gaussian Splatting in real time like inference real-time view?

Thanks in advance

Answer 1 · 2023-12-21T09:07:47.000Z

Hey there!

Our method supports RGBD input only. Potentially, you can put some other pre-trained depth estimator and run the method. For example, AdaBins (supports higher image resolution), or official AdaBins implementation for depth estimation might be a good start
So you would run the method, then you'll get segments of the scene, and then you merge them (we'll provide a script for that). After, you'll be able to render the reconstructed scene in real-time

Answer 2 · 2023-12-21T11:22:40.000Z

Thanks @VladimirYugay that sounds perfect and thanks for the additional comment about AdaBins. This will be really helpful to advance my current project. Another question of my side would be.
Regarding the room scene demonstrated in the video, am I correct in understanding that it was not recorded and live-stitched as depicted? Instead, only a pre-rendered room is presented in the step-by-step procedure, correct? Thus, would it be possible for me to capture photos or videos and subsequently utilize the described method and merging process to visualize the scene? I'm also seeking clarification on whether it's currently feasible to view an image frame and instantly create Gaussian Splatting from this input in real-time, followed by performing SLAM on a sequential frame-by-frame basis?
Do you already know, if it will be possible to release the code under Apache 2.0 or MIT license?

Thanks in advance and happy holidays.

Answer 3 · 2023-12-21T11:50:00.000Z

I am glad that it helped!
We recorded the video of the scene after the segments were stitched. Potentially, you can implement stitching in parallel as SLAM reconstructs the segments, but we didn't do it due to time constraints. Regarding a custom RGB video, I think it is doable. I can imagine that it would work as follows: you record a video, run dense depth prediction on it for every frame, run SLAM on it (you will be able to plug any tracker that you want e.g. Orb-SLAM, Droid-SLAM, etc.), stitch the segments, and then obtain the scene which is renderable in real-time.
Regarding licensing it's a pain point right now - I just don't know. It is also the main reason why we can't release everything straight away.