Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João F. Henriques, Christian Rupprecht, Andrea Vedaldi
arXiv 2406.04343
-
19.07.2024
: Training code and data release
Flash3D has been trained and tested with the followings software versions:
- Python 3.10
- Pytorch 2.2.2
- CUDA 11.8
- GCC 11.2 (or more recent)
Begin by installing CUDA 11.8 and adding the path containing the nvcc
compiler to the PATH
environmental variable.
Then the python environment can be created either via conda:
conda create -y python=3.10 -n flash3d
conda activate flash3d
or using Python's venv module (assuming you already have access to Python 3.10 on your system):
python3.10 -m venv .venv
. .venv/bin/activate
Finally, install the required packages as follows:
pip install -r requirements-torch.txt --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
For downloading the RealEstate10K dataset we base our instructions on the Behind The Scenes scripts.
First you need to download the video sequence metadata including camera poses from https://google.github.io/realestate10k/download.html and unpack it into data/
such that the folder layout is as follows:
data/RealEstate10K/train
data/RealEstate10K/test
Finally download the training and test sets of the dataset with the following commands:
python datasets/download_realestate10k.py -d data/RealEstate10K -o data/RealEstate10K -m train
python datasets/download_realestate10k.py -d data/RealEstate10K -o data/RealEstate10K -m test
This step will take several days to complete. Finally, download additional data for the RealEstate10K dataset. In particular, we provide pre-processed COLMAP cache containing sparse point clouds which are used to estimate the scaling factor for depth predictions. The last two commands filter the training and testing set from any missing video sequences.
sh datasets/dowload_realestate10k_colmap.sh
python -m datasets.preprocess_realestate10k -d data/RealEstate10K -s train
python -m datasets.preprocess_realestate10k -d data/RealEstate10K -s test
We provide model weights that could be downloaded and evaluated on RealEstate10K test set:
python -m misc.download_pretrained_models -o exp/re10k_v2
sh evaluate.sh exp/re10k_v2
In order to train the model on RealEstate10K dataset execute this command:
python train.py \
+experiment=layered_re10k \
model.depth.version=v1 \
train.logging=false
For multiple GPU, we can run with this command:
sh train.sh
You can modify the cluster information in configs/hydra/cluster
.
@article{szymanowicz2024flash3d,
author = {Szymanowicz, Stanislaw and Insafutdinov, Eldar and Zheng, Chuanxia and Campbell, Dylan and Henriques, Joao and Rupprecht, Christian and Vedaldi, Andrea},
title = {Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image},
journal = {arxiv},
year = {2024},
}