Federico Tombari, Nassir Navab, and Benjamin Busam
conda create -n echoscene python=3.8
conda activate echoscene
We have tested it on Ubuntu 20.04 with PyTorch 1.11.0, CUDA 11.3 and Pytorch3D.
pip install -r requirements.txt
pip install einops omegaconf tensorboardx open3d
(Note: if one encounters a problem with PyYAML, please refer to this link.)
Install mmcv-det3d (optional):
mim install mmengine
mim install mmcv
mim install mmdet
mim install mmdet3d
Install CLIP:
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
I. Download 3D-FUTURE-SDF. This is processed by ourselves on the 3D-FUTURE meshes using tools in SDFusion.
II. Follow this page for downloading SG-FRONT dataset and accessing more information.
III. Optional
-
Download the 3D-FRONT dataset from their official site.
-
Preprocess the dataset following ATISS.
IV. Create a folder named FRONT
, and copy all files to it.
The structure should be similar like this:
FRONT
|--3D-FUTURE-SDF
|--All SG-FRONT files (.json and .txt)
|--3D-FRONT (optional)
|--3D-FRONT-texture (optional)
|--3D-FUTURE-model (optional)
|--3D-FUTURE-scene (optional)
|--3D-FRONT_preprocessed (optional, by ATISS)
|--threed_front.pkl (optional, by ATISS)
Essential: Download pretrained VQ-VAE model from here to the folder scripts/checkpoint
.
Optional: We provide two trained models: EchoLayout
available here and EchoScene
available here.
To train the models, run:
cd scripts
python train_3dfront.py --exp /path/to/exp_folder --room_type all --dataset /path/to/dataset --residual True --network_type echoscene --with_SDF True --with_CLIP True --batchSize 64 --workers 8 --loadmodel False --nepoch 10000 --large False --use_scene_rels True
--exp
: the path where trained models and logs would like to be stored.
--room_type
: rooms to train, e.g., 'livingroom', 'diningroom', 'bedroom', and 'all'. We train all rooms together in the implementation.
--network_type
: the network to be trained. echoscene
is EchoScene, echolayout
is EchoLayout (retrieval method, single layout generation branch).
--with_SDF
: set to True
if train EchoScene.
--batch_size
: the batch size for the layout branch training.
--large
: default is False
, True
means more concrete categories.
To evaluate the models run:
cd scripts
python eval_3dfront.py --exp /path/to/trained_model --dataset /path/to/dataset --epoch 2050 --visualize True --room_type all --render_type echoscene --gen_shape True
--exp
: where the models are stored. If one wants to load our provided models, the path should be aligned with
--gen_shape
: set True
if one wants to make shape branch work.
This metric aims to evaluate scene-level fidelity. To evaluate FID/KID, you first need to download top-down gt rendered images for retrieval methods and sdf rendered images for generative methods, or collect renderings by modifying and running collect_gt_sdf_images.py
. Note that the flag without_lamp
is set to True
in our experiment.
Then, the renderings of generated scenes can be obtained inside eval_3dfront.py
.
After obtaining both ground truth images and generated scenes renderings, run compute_fid_scores_3dfront.py
.
This metric aims to evaluate object-level fidelity. To evaluate this, you need to first obtain ground truth object meshes from here (~5G).
Secondly, store per generated object in the generated scenes, which can be done in eval_3dfront.py
.
After obtaining object meshes, modify the path in compute_mmd_cov_1nn.py
and run it to have the results.
This metric is based on Chamfer Distance. which checks how the generated shapes of two identical objects are similar to each other. To evaluate this, you need to download the consistency information from here, modify the paths in consistency_check.py
, and run this script.
Relevant work: Graph-to-3D, CommonScenes, DiffuScene, InstructScene, SceneTex.
Disclaimer: This is a code repository for reference only; in case of any discrepancies, the paper shall prevail.
We thank DiffuScene's author Jiapeng Tang and InstructScene's author Chenguo Lin for providing the code and helpful discussions, and additionally thank Mahdi Saleh for titling the paper as EchoScene
, which is vivid and catchy :)