Meta AI Research, GenAI; University of Oxford, VGG
Jianyuan Wang, Nikita Karaev, Christian Rupprecht, David Novotny
[Paper] [Project Page] [🤗 Demo] [Version 2.0]
Updates:
-
[Aug 6, 2024]
- VGGSfM is now available as a Python package, making it easier to integrate into other codebases!
- Introduced a new
VGGSfMRunner
class that serves as a central controller for all functionalities. - Added support for gradio visualization, controlled by
gr_visualize
.
-
[Jul 28, 2024]
- Added support for filtering out dynamic objects using
masks
. We will add an example soon but you can checkdemo_loader.py
for a quick view. - Added support for
visual_dense_point_cloud
.
- Added support for filtering out dynamic objects using
-
[Jul 10, 2024] Now we support exporting dense depth maps!
-
Happy to share we were ranked 1st 🥇 in the CVPR24 IMC Challenge regarding camera pose (Rot&Trans) estimation.
We provide a simple installation script that, by default, sets up a conda environment with Python 3.10, PyTorch 2.1, and CUDA 12.1.
source install.sh
python -m pip install -e .
This script installs official pytorch3d
, lightglue
, pycolmap
, poselib
, and visdom
. If you cannot install pytorch3d
on your machine, feel free to comment the line, because now we only use it for visdom visualization (i.e., cfg.viz_visualize=True
).
The checkpoint will be automatically downloaded from Hugging Face. You can also manually download it from Hugging Face and Google Drive. If you prefer to specify the checkpoint path manually, set the path in resume_ckpt
and set auto_download_ckpt
to False.
Now it's time to enjoy your 3D reconstruction! You can start with our provided examples:
# Use default settings
python demo.py SCENE_DIR=examples/kitchen
# Specify query method: sp+sift (default: aliked)
python demo.py SCENE_DIR=examples/statue query_method=sp+sift
# Increase query number to 4096 (default: 2048)
python demo.py SCENE_DIR=examples/british_museum max_query_pts=4096
# Assume a shared camera model for all frames, and
# use SIMPLE_RADIAL camera model instead of the default SIMPLE_PINHOLE
python demo.py shared_camera=True camera_type=SIMPLE_RADIAL
# If you want a fast reconstruction without fine tracking
python demo.py SCENE_DIR=examples/kitchen fine_tracking=False
All default settings for the flags are specified in cfgs/demo.yaml
. You can adjust these flags as needed, such as reducing max_query_pts
to lower GPU memory usage. To enforce a shared camera model for a scene, set shared_camera=True
. To use query points from different methods, set query_method
to sp
, sift
, aliked
, or any combination like sp+sift
. fine_tracking
is set to True by default. Set it to False to switch to coarse mode for faster inference.
To run reconstruction on a scene with 100
frames on a 32 GB
GPU, you can start from the setting below:
python demo.py SCENE_DIR=TO/YOUR/PATH max_query_pts=1024 query_frame_num=6
The reconstruction result (camera parameters and 3D points) will be automatically saved under SCENE_DIR
in the COLMAP format, as cameras.bin
, images.bin
, and points3D.bin
. This format is widely supported by the recent NeRF/Gaussian Splatting codebases. You can use COLMAP GUI or viser to view the reconstruction.
If you want to visualize it more easily, we also provide visualization options using Visdom and Gradio.
To begin using Visdom, start the server by entering visdom
in the command line. Once the server is running, access Visdom by navigating to http://localhost:8097
in your web browser. Now every reconstruction can be visualized and saved to the Visdom server by enabling viz_visualize=True
:
python demo.py viz_visualize=True ...(other flags)
You should see an interface like this:
For a serverless option, use Gradio by setting gr_visualize
to True. This will generate a link accessible from any browser (but it may take seconds to load).
python demo.py gr_visualize=True ...(other flags)
- 2D Reprojections:
- To visualize the 2D reprojections of reconstructed 3D points, set the
make_reproj_video
flag toTrue
. This will generate a video namedreproj.mp4
in theSCENE_DIR/visuals
directory. For example:
- To visualize the 2D reprojections of reconstructed 3D points, set the
- Track Predictions:
- To visualize the raw predictions from our track predictor, enable
visual_tracks=True
to generatetrack.mp4
. In this video, transparent points indicate low visibility or confidence.
- To visualize the raw predictions from our track predictor, enable
You only need to specify the address of your data, such as:
python demo.py SCENE_DIR=examples/YOUR_FOLDER ...(other flags)
Please ensure that the images are stored in YOUR_FOLDER/images
. This folder should contain only the images. Check the examples
folder for the desired data structure.
Have fun and feel free to create an issue if you meet any problem. SfM is always about corner/hard cases. I am happy to help. If you prefer not to share your images publicly, please send them to me by email.
We support extracting dense depth maps with the help of Depth-Anything-V2. Basically, we align the dense depth prediction from Depth-Anything-V2 using the sparse SfM point cloud predicted by VGGSfM. To enable this, please first git clone Depth-Anything-V2 and install scikit-learn:
pip install scikit-learn
git clone git@github.com:DepthAnything/Depth-Anything-V2.git dependency/depth_any_v2
python -m pip install -e .
Then, you just need to set dense_depth=True
when running demo.py. Depth maps will be saved in the depths
folder under cfg.SCENE_DIR
, using the COLMAP format (e.g., *.bin
). To visualize the dense point cloud (unprojected dense depth maps) in Visdom, set visual_dense_point_cloud=True
(note it may take seconds to open the Visdom page when there are too many points).
- What should I do if I encounter an out-of-memory error?
To resolve an out-of-memory error, you can simply try reducing the number of max_query_pts
to a lower value. Be aware that this may result in a sparser point cloud and could potentially impact the accuracy of the reconstruction. Please note that in the latest commit, the value of query_frame_num
will not affect the GPU memory consumption any more. Feel free to increase query_frame_num
.
- How to handle sparse data with minimal view overlap?
For scenarios with sparse views and minimal overlap, the simplest solution is to set query_frame_num
to the total number of your images and use a max_query_pts
of 4096 or more. This ensures all frames are registered. Since we only have sparse views, the inference process remains very fast. For example, the following command took around 20 seconds to reconstruct an 8-frame scene:
python demo.py SCENE_DIR=a_scene_with_8_frames query_frame_num=8 max_query_pts=4096 query_method=aliked
- When should I set
shared_camera
to True?
Set shared_camera
to True when you know that the input frames were captured by the same camera and the camera focal length did not significantly change during the capture. This assumption is usually valid for images extracted from a video.
We are still preparing the testing script for VGGSfM v2. However, you can use our code for VGGSfM v1.1 to reproduce our benchmark results in the paper. Please refer to the branch v1.1
.
We are highly inspired by colmap, pycolmap, posediffusion, cotracker, and kornia.
See the LICENSE file for details about the license under which this code is made available.
If you find our repository useful, please consider giving it a star ⭐ and citing our paper in your work:
@inproceedings{wang2024vggsfm,
title={VGGSfM: Visual Geometry Grounded Deep Structure From Motion},
author={Wang, Jianyuan and Karaev, Nikita and Rupprecht, Christian and Novotny, David},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={21686--21697},
year={2024}
}