Evaluate GTSfM as a Potential Replacement for COLMAP for Dataset Preparation
MrNeRF opened this issue · 17 comments
As an user I am interested in training 3D Gaussian splatting models using this project. Currently, it seems that using COLMAP for dataset preparation is a mandatory step, but I have noticed that COLMAP can be somewhat slow in processing datasets.
I came across GTSfM and it appears to be a promising alternative to COLMAP. Therefore, I kindly request the following:
- Feasibility Analysis: Please evaluate whether GTSfM can effectively replace COLMAP in the workflow of preparing datasets for training the 3D Gaussian splatting models. Specifically, assess if GTSfM can generate the necessary input data that the 3D Gaussian splatting models require for training.
- Integration Requirements: If GTSfM proves to be a suitable alternative, please outline what changes would be needed in our current pipeline to adapt it to use GTSfM's output as input for 3D Gaussian splatting training. Can we use it directly?
- Quality Comparison: Please perform a comparative analysis using the same dataset to assess how the quality of input data generated by GTSfM compares to that generated by COLMAP. This could include comparisons of key metrics such as reconstruction accuracy, completeness, and processing time but could be also simply decided by human jugment :).
- Documentation Update: If GTSfM proves to be a viable and advantageous alternative, it would be great to have an updated guide or section in the README that explains how users can set up and use GTSfM as part of the pipeline, along with any noted benefits and trade-offs.
The goal of this request is to potentially improve the efficiency and flexibility of the dataset preparation step, while maintaining or improving the quality of the data used for training 3D Gaussian splatting models.
@MrNeRF While GTSfM offers a range of features, it may not be the most flexible option for all use cases. One limitation is that it requires the camera's intrinsic parameters, which necessitates capturing images in photo mode rather than video mode. This approach might be suitable for object capture or small scenes, but it could pose challenges for larger scenes due to its inconvenience.
From my experience, it seems that using the sequential_matcher and vocab_tree_matcher modes in COLMAP allows for the registration of 1k images in a matter of minutes. On the other hand, the exhaustive_matcher mode takes significantly longer.
To improve the speed of the process, just replace the exhaustive_matcher in the original code with vocab_tree_matcher will speed up a lot.
def get_vocab_tree():
"""Return path to vocab tree. Downloads vocab tree if it doesn't exist.
Returns:
The path to the vocab tree.
"""
import requests
from rich.progress import track
from pathlib import Path
vocab_tree_filename = Path("/tmp/vocab_tree.fbow")
if not vocab_tree_filename.exists():
r = requests.get("https://demuc.de/colmap/vocab_tree_flickr100K_words32K.bin", stream=True)
vocab_tree_filename.parent.mkdir(parents=True, exist_ok=True)
with open(vocab_tree_filename, "wb") as f:
total_length = r.headers.get("content-length")
assert total_length is not None
for chunk in track(
r.iter_content(chunk_size=1024),
total=int(total_length) / 1024 + 1,
description="Downloading vocab tree...",
):
if chunk:
f.write(chunk)
f.flush()
return vocab_tree_filename
vocab_tree_filename = get_vocab_tree()
feat_matching_cmd = colmap_command + " vocab_tree_matcher \
--database_path " + args.source_path + "/distorted/database.db \
--SiftMatching.use_gpu " + str(use_gpu) + "--VocabTreeMatching.vocab_tree_path \
{str(vocab_tree_filename)}"
This change should maintain similar performance levels while completing the task in under 10 minutes.
By the way, NeRFCapture may be a good choice to replace colmap. It's highly flexible and provides immediate captures, which fully meet your needs. It allows direct photo streaming from a mobile phone to a computer in both online and offline modes. If NeRFCapture piques your interest, I am willing to provide some assistance.
Thank you @hugoycj for these insights. I agree that COLMAP seems to be the way to go until someone provides a viable alternative. NeRFCapture is new to me, but after taking a brief look, it appears very promising—albeit with some specific requirements.
My vision is to create an out-of-the-box solution that is incredibly fast, easy to use, free, and capable of producing 3DGS scenes directly from image/video input. This is still a vision, but I am confident that we will get there.
I opened this issue to seek insights into the best approach for data preprocessing.
Have you tried integrating NeRFCapture with 3DGS? If so, have you posted a video of the results? My current focus is on optimizing the training process.
If you are interested and have the time, evaluating NeRFCapture in conjunction with 3DGS—and comparing this with COLMAP input data using a specific dataset—would be tremendously valuable. It would give us a concrete sense of how it performs in practice.
I see. To create an out-of-the-box solution for producing 3DGS scenes, I suggest a command-line interface similar to the one used by NerfStudio. This interface would allow users to create a NeRF scene in two steps:
ns-process-data {images, video} --data {DATA_PATH} --output-dir {PROCESSED_DATA_DIR}
ns-train nerfacto --data {PROCESSED_DATA_DIR}
I will submit a PR later this week that will implement such a feature. This feature will significantly enhance the user experience for first-time users of 3DGS. I am planning to:
- Fully integrate the colmap preprocessing into this project
- Implement a command-line interface like NerfStudio
- Add docs for preprocessing
The NeRFCapture feature is a more complex one to implement as it involves real-time mobile phone connectivity, the creation of a custom data loader to incrementally receive images from the mobile phone, and the dynamic addition of new point clouds, among other things. However, integrating NeRFCapture into the 3DGS production process can significantly speed it up, reducing initialization time to mere seconds. It also allows users to adjust their textures based on the 3DGS rendering results in real time. I am interested in doing so after implementing the command-line preprocessing interface.
yes, integration of colmap pipeline and CLI interface will significantly improve user experience.
Looking forward to integrate thinner viewer and trajectory editor into the project, such as viser.
Lots of excellent suggestions here. Integrating data preprocessing will indeed be invaluable, paving the way for broader adaptation of this project.
Do you have plans to implement this in C++? I'm keen on this approach mainly to sidestep the complications associated with Python dependencies. There are other open issues aimed at minimizing dependencies, so in line with that, it'd be great if we could maintain as few dependencies as possible. Could you shed some light on how you intend to "Fully integrate the colmap preprocessing into this project"?
It's essential that we retain control over most of these components, allowing us to optimize both speed and quality. I want to stress the significance of speed. Much of today's software feels cumbersome, laden with bloatware, and suffers from protracted startup times. Speed is of the essence. For example, the swift response time is the sole reason I still default to Google as my primary search engine — it delivers immediate results, allowing for quick iterations on search queries.
Looking forward to see you pull request!
Regarding the viewer: Absolutely, we will require a robust solution to ensure optimal camera control. This is a significant TODO, and perhaps we should prioritize it more prominently. Maybe I will need a break from optimizing, so I will start something.
One more issue: At one point we should aim at implementing the rasterizer from scratch. Finally, I want have a more free license that also allows for more freedom, i.e. use this project commercially.
Lots of excellent suggestions here. Integrating data preprocessing will indeed be invaluable, paving the way for broader adaptation of this project.
Do you have plans to implement this in C++? I'm keen on this approach mainly to sidestep the complications associated with Python dependencies. There are other open issues aimed at minimizing dependencies, so in line with that, it'd be great if we could maintain as few dependencies as possible. Could you shed some light on how you intend to "Fully integrate the colmap preprocessing into this project"?
It's essential that we retain control over most of these components, allowing us to optimize both speed and quality. I want to stress the significance of speed. Much of today's software feels cumbersome, laden with bloatware, and suffers from protracted startup times. Speed is of the essence. For example, the swift response time is the sole reason I still default to Google as my primary search engine — it delivers immediate results, allowing for quick iterations on search queries.
Looking forward to see you pull request!
I completely agree with your viewpoint. In my opinion, it is more convenient to initially utilize COLMAP's source code as a third-party static library in the first steps. This approach allows for easy integration into existing projects. Later it can be transformed into a static library, similar to libtorch. Eventually, it may be beneficial to consider alternative libraries such as GTSFM or OpenMVG. These options are potentially more lightweight, thereby minimizing resource consumption and enhancing performance.
When taking license and dependencies minimization into consideration, OpenMVG is often considered a better choice than Colmap and GTSFM. It has minimal dependencies and allows for commercial use.
Maybe I could evaluate OpenMVG firstly. For using colmap as third party will introduce a lot of dependencies
I am cool with either colmap or another solutions as long as the quality does not suffer and it is reasonable easy to use. We can take the more convenient route for now and then iterate over the dependencies as we implement own solution. I would suggest, you decide what fits best our current needs. Just keep in mind that the goal should be to create a lightweight 3DGS solution (maybe some kind of luma AI style thing with even better quality). Glad to have you on board!
Also consider hloc as an option. @hugoycj not sure if you can comment on it. I was able to improve rendering / training using this. It can/does run everything through COLMAP in it's final pass so the training code can still just ingest COLMAP files. The main downside I see is that it's not 'lightweight', but it was easy to integrate.
Yeap. I will open a branch for this issue.
Hloc is a good option too, it totally follow colmap interface and data structure. The only disadvantage is that it's not C++-based. To make this repo thinner, I perfer colmap as first choice
Here is more input: https://twitter.com/janusch_patas/status/1696236337593016480
Saved to revisit later!
Seems irrelevant and nobody worked on it. Closed!