VGGSfM: Visual Geometry Grounded Deep Structure From Motion

Meta AI Research, GenAI; University of Oxford, VGG

Jianyuan Wang, Nikita Karaev, Christian Rupprecht, David Novotny

Updates:

[Apr 23, 2024] Release the code and model weight for VGGSfM v1.1.

Installation

We provide a simple installation script that, by default, sets up a conda environment with Python 3.10, PyTorch 2.1, and CUDA 12.1.

source install.sh

Testing on IMC

1. Download Dataset and Model

To get started, you'll need to download the IMC dataset. You can do this by running the following commands in your terminal:

wget https://www.cs.ubc.ca/research/kmyi_data/imc2021-public/imc-2021-test-gt-phototourism.tar.gz

tar -xzvf imc-2021-test-gt-phototourism.tar.gz

Once the dataset is downloaded and extracted, you'll need to specify its path in the IMC_DIR field in the ./cfgs/test.yaml configuration file or give it as an input such as python test.py IMC_DIR=YOUR/PATH.

Next, you'll need to download the model checkpoint of v1.1 for testing or v1.2 for demo.

After downloading the model checkpoint, specify its path in the resume_ckpt field in ./cfgs/test.yaml.

2. Run Testing

python test.py

When it finishes (it would take several hours to complete the testing on the whole IMC dataset), you should see something like:

----------------------------------------------------------------------------------------------------
On the IMC dataset (query_frame_num=3)
Auc_3  (%): 64.74418604651163
Auc_5  (%): 72.20720930232558
Auc_10 (%): 80.98441860465115
----------------------------------------------------------------------------------------------------

If your machine support torch.bfloat16, you are welcome to enable the use_bf16 option in the configuration file or by python test.py use_bf16=True. Our model was trained using bf16 and the testing performance is nearly identical when using bf16.

Typically, running our model on a 25-frame IMC scene takes approximately 40 seconds. If you're looking to save time, you can adjust the query_frame_num to 1. This adjustment reduces the inference time to roughly 15 seconds, while maintaining a comparable performance.

----------------------------------------------------------------------------------------------------
On the IMC dataset (query_frame_num=1)
Auc_3  (%): 61.99207579672695
Auc_5  (%): 69.78997416020671
Auc_10 (%): 78.88826873385013
----------------------------------------------------------------------------------------------------

If want to run the model on your own data, please check the run_one_scene function in test.py. We are also going to provide a demo file for it very soon. The default output cameras of run_one_scene follows the PyTorch3D convention. You can set return_in_pt3d=False to let it return in COLMAP convention.

Acknowledgement

We are highly inspired by colmap, pycolmap, posediffusion, cotracker, and kornia.

License

See the LICENSE file for details about the license under which this code is made available.

Citing VGGSfM

If you find our repository useful, please consider giving it a star ⭐ and citing our paper in your work:

@article{wang2023vggsfm,
  title={VGGSfM: Visual Geometry Grounded Deep Structure From Motion},
  author={Wang, Jianyuan and Karaev, Nikita and Rupprecht, Christian and Novotny, David},
  journal={arXiv preprint arXiv:2312.04563},
  year={2023}
}

jetd1/vggsfm