/Instant-angelo

Instant-angelo: Build high-fidelity Digital Twin within 20 Minutes!

Primary LanguagePythonMIT LicenseMIT

Instant-angelo: Build high-fidelity Digital Twin within 20 Minutes!

Introduction

Neuralangelo facilitates high-fidelity 3D surface reconstruction from RGB video captures. It enables the creation of digital replicas of both small-scale objects and large real-world scenes, derived from common mobile devices. These digital replicas, or 'twins', are represented with an exceptional level of three-dimensional geo-detail.

Nevertheless, substantial room for improvement exists. At present, the official and reimplemented Neuralangelo implementation requires 40 hours and 40 GB on an A100 for real world scene reconstructions. An expedited variant in instant-nsr has been developed, but the results have been subpar due to parameter limitations.

To fill this gap in high-speed, high-fidelity reconstruction, our objective is to engineer an advanced iteration of Neuralangelo. This refined model will focus on high-fidelity neural surface reconstruction, streamlining the process to achieve results within an unprecedented 20 minute timeline while maintaining the highest standard of quality.

We provide Quick Lookup examples of project outcomes. These examples can serve as a reference to help determine if this project is suitable for your use case scenario.

Installation

pip install torch torchvision
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install -r requirements.txt

For COLMAP, alternative installation options are also available on the COLMAP website

Data preparation

To extract COLMAP data from custom images, you must first have COLMAP installed (you can find installation instructions [here]). Afterwards, place your images in the images/ folder. The structure of your data should be as follows:

-data_001
    -images
    -mask (optional)
-data_002
    -images
    -mask (optional)
-data_003
    -images
    -mask (optional)

Each separate data folder houses its own images folder.

If you have the mask, we recommend filtering the colmap sparse point using it before starting the reconstruction. You can use the following manual script for preprocessing:

python scripts/run_colmap.py ${INPUT_DIR}
python scripts/filter_colmap.py --data ${INPUT_DIR} --output-dir ${INPUT_DIR}_filtered

In the script, ${INPUT_DIR} should be replaced with the actual directory path where your data is located.

The first line runs the colmap reconstruction script with full image. The second line filters the colmap sparse point using the specified mask, and saves the filtered data in a new output directory with the suffix "_filtered".

Start Reconstruction!

Run Smooth Surface Reconstruction in 20 Minutes

[Click to expand] The smooth reconstruction mode is well-suited for the following cases:
  • When reconstructing a smooth object that does not have a high level of detail. The smooth mode works best for objects that have relatively simple, flowing surfaces without a lot of intricate features.

  • When you want a higher-fidelity substitute for instant-nsr that takes a similar amount of time (within 20 minute) to generate but with fewer holes in the resulting model.


Information you need to know before you start:

  • The smooth reconstruction mode's reliance on curvature loss can over-smooth geometry, failing to capture flat surface structures and subtle variations on flatter regions of the original object.
  • This mode relies on sparse points generated by colmap to guide the geometry in the early stage of training. However, SFM (Structure from Motion) can sometimes generate noisy point clouds due to factors such as repeated texture, inaccurate poses, or incorrect point matches. To address this issue, one possible solution is to utilize more powerful SFM tools like hloc or DetectorFreeSfM. Additionally, post-processing techniques can be employed to further refine the point cloud. For example, using methods like Radius Outlier Removal in Open3D or pixsfm can help eliminate outliers and improve the quality of the point cloud.

Now it is time to start by running:

bash run_neuralangelo-colmap_sparse.sh ${INPUT_DIR}

This script is designed to automate the process of running SFM without the need for any preparation beforehand. It will automatically initiate the reconstruction process and export the resulting mesh. The output files will be saved in the logs directory.

If mask is avaible and placed at the right place under data_folder you could start by running:

bash run_neuralangelo-colmap_sparse.sh ${INPUT_DIR}_filtered

Additionally, we have developed an experimental version called SH-neuralangelo, which utilizes Spherical Harmonics (SH) instead of Multilayer Perceptron (MLP) for radiance field. SH-neuralangelo is inspired by Plenoxel and Gaussian Splatting, incorporating progressive Spherical Harmonics for faster convergence and better coefficient regulation.

bash run_SH-neuralangelo-colmap_sparse.sh ${INPUT_DIR}

However, currently, SH-Neus is inferior to the original Neus with MLP in terms of PSNR and reconstruction quality. We are actively working on improving its quality and plan to support exporting Spherical Harmonics coefficients for real-time viewers in the future, similar to Gaussian Splatting.

Run Detail Surface Reconstruction in 20 Minutes

Snipaste_2023-11-20_11-22-10 Many thanks to youmi-zym for creating the image on Tanks and Temples.

[Click to expand]

Generating high-fidelity surface reconstructions with only RGB inputs in 20,000 steps (around 20 minutes) is challenging, especially for sparse in-the-wild captures where occlusion and limited views make surface reconstruction an underconstrained problem. This can lead to optimization instability and difficulty converging. Introducing lidar, ToF depth, or predicted depth can help stabilize optimization and accelerate training. However, directly regularizing rendered depth is suboptimal due to bias introduced by density2sdf. Moreover, ensuring consistent depth across views is difficult, especially with lower-quality ToF sensors or predicted depth. We propose directly regularizing the SDF field using MVS point clouds and normals to alleviate the bias

Importantly, in real-world scenarios like oblique photography and virtual tours, dense point clouds are already intermediate outputs. This allows directly utilizing the existing point clouds for regularization without extra computation. In such use cases, the point cloud prior comes for free as part of the capture process.

Information you need to know before you start:

  • An aligned dense point cloud with normal is necessary, you could specify the relative path at dataset.dense_pcd_path in the config file
  • The point cloud could be generated from various methods, either from traditional MVS like colmap or OpenMVS, or learning-based MVS method. You could even generate the point cloud using commercial photogrammetry software like metashape and DJI.

Now it is time to start by running:

bash run_neuralangelo-colmap_dense.sh  ${INPUT_DIR}

Frequently asked questions (FAQ)

[Click to expand]
  1. Q: CUDA out of memory.

    A: Instant-angelo requires at least 10GB GPU memory. If you run out of memory, consider decreasing model.num_samples_per_ray from 1024 to 512

  2. Q: What's the License for this repo?

    A: This repository is built on top of instant-nsr-pl and is licensed under the MIT License. The materials, code, and assets in this repository can be used for commercial purposes without explicit permission, in accordance with the terms of the MIT License. Users are free to use, modify, and distribute this content, even for commercial applications. However, appropriate attribution to the original instant-nsr-pl authors and this repository is requested. Please refer to the LICENSE file for full terms and conditions.

  3. Q: The reconstruction of my custom dataset is bad.

    A: This repository is under active development and its robustness across diverse real-world data is still unproven. Users may encounter issues when applying the method to new datasets. Please open an issue for any problems or contact the author directly at chongjieye@link.cuhk.edu.cn.

  4. Q: Generate dense prior with Vis-MVSNet is slow

    A: Currently, preprocessing takes around 10~15 minutes for 300 frames, but there is still remains much room to improve efficiency by replacing Vis-MVSNet with state-of-the-art methods like MVSFormer or SimpleRecon. Moreover, preprocessing time could be substantially reduced by leveraging quantization and TensorRT. Overall, MVSNet allows generating the necessary point cloud prior an order of magnitude faster than traditional MVS approaches.

  5. Q: This project fails to run on Windows

    A: This project has not been tested on Windows and the scripts may have compatibility issues. For the best experience at this stage of development, we recommend running experiments on a Linux system. We apologize that Windows support cannot be guaranteed currently. Please feel free to open an issue detailing any problems encountered when attempting to run on Windows. Community feedback will help improve cross-platform compatibility going forward.

Related project:

  • instant-nsr-pl: Great Instant-NSR implementation in PyTorch-Lightning!
  • neuralangelo: Official implementation of Neuralangelo: High-Fidelity Neural Surface Reconstruction
  • sdfstudio: Unified Framework for SDF-based Neural Reconstruction, easy to development
  • torch-bakedsdf: Unofficial pytorch implementation of BakedSDF:Meshing Neural SDFs for Real-Time View Synthesis

Acknocklement

  • Thanks to bennyguo for his excellent pipeline instant-nsr-pl
  • Thanks to RaduAlexandru for his implementation of improved curvature loss in permuto_sdf
  • Thanks to Alex Yu for his implementation of spherical harmonics in svox2
  • Thanks for Zesong Yang and Chris for providing valuable insights and feedback that assisted development