EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen, Ruotong Liao, Yan Di,
Federico Tombari, Nassir Navab, and Benjamin Busam

Technical University of Munich • Ludwig Maximilian University of Munich • Google

Arxiv | Website

Setup

Environment

conda create -n echoscene python=3.8
conda activate echoscene

We have tested it on Ubuntu 20.04 with PyTorch 1.11.0, CUDA 11.3 and Pytorch3D.

pip install -r requirements.txt 
pip install einops omegaconf tensorboardx open3d

(Note: if one encounters a problem with PyYAML, please refer to this link.)

Install mmcv-det3d (optional):

mim install mmengine
mim install mmcv
mim install mmdet
mim install mmdet3d

Install CLIP:

pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

Dataset

I. Download 3D-FUTURE-SDF. This is processed by ourselves on the 3D-FUTURE meshes using tools in SDFusion.

II. Follow this page for downloading SG-FRONT dataset and accessing more information.

III. Optional

Download the 3D-FRONT dataset from their official site.
Preprocess the dataset following ATISS.

IV. Create a folder named FRONT, and copy all files to it.

The structure should be similar like this:

FRONT
|--3D-FUTURE-SDF
|--All SG-FRONT files (.json and .txt)
|--3D-FRONT (optional)
|--3D-FRONT-texture (optional)
|--3D-FUTURE-model (optional)
|--3D-FUTURE-scene (optional)
|--3D-FRONT_preprocessed (optional, by ATISS)
|--threed_front.pkl (optional, by ATISS)

Models

Essential: Download pretrained VQ-VAE model from here to the folder scripts/checkpoint.

Optional: We provide two trained models: EchoLayout available here and EchoScene available here.

Training

To train the models, run:

cd scripts
python train_3dfront.py --exp /path/to/exp_folder --room_type all --dataset /path/to/dataset --residual True --network_type echoscene --with_SDF True --with_CLIP True --batchSize 64 --workers 8 --loadmodel False --nepoch 10000 --large False --use_scene_rels True

--exp: the path where trained models and logs would like to be stored.

--room_type: rooms to train, e.g., 'livingroom', 'diningroom', 'bedroom', and 'all'. We train all rooms together in the implementation.

--network_type: the network to be trained. echoscene is EchoScene, echolayout is EchoLayout (retrieval method, single layout generation branch).

--with_SDF: set to True if train EchoScene.

--batch_size: the batch size for the layout branch training.

--large : default is False, True means more concrete categories.

Evaluation

To evaluate the models run:

cd scripts
python eval_3dfront.py --exp /path/to/trained_model --dataset /path/to/dataset --epoch 2050 --visualize True --room_type all --render_type echoscene --gen_shape True

--exp: where the models are stored. If one wants to load our provided models, the path should be aligned with

--gen_shape: set True if one wants to make shape branch work.

FID/KID

This metric aims to evaluate scene-level fidelity. To evaluate FID/KID, you first need to download top-down gt rendered images for retrieval methods and sdf rendered images for generative methods, or collect renderings by modifying and running collect_gt_sdf_images.py. Note that the flag without_lamp is set to True in our experiment.

Then, the renderings of generated scenes can be obtained inside eval_3dfront.py.

After obtaining both ground truth images and generated scenes renderings, run compute_fid_scores_3dfront.py.

MMD/COV/1-NN

This metric aims to evaluate object-level fidelity. To evaluate this, you need to first obtain ground truth object meshes from here (~5G).

Secondly, store per generated object in the generated scenes, which can be done in eval_3dfront.py. After obtaining object meshes, modify the path in compute_mmd_cov_1nn.py and run it to have the results.

Consistency

This metric is based on Chamfer Distance. which checks how the generated shapes of two identical objects are similar to each other. To evaluate this, you need to download the consistency information from here, modify the paths in consistency_check.py, and run this script.

Acknowledgements

Relevant work: Graph-to-3D, CommonScenes, DiffuScene, InstructScene, SceneTex.

Disclaimer: This is a code repository for reference only; in case of any discrepancies, the paper shall prevail.

We thank DiffuScene's author Jiapeng Tang and InstructScene's author Chenguo Lin for providing the code and helpful discussions, and additionally thank Mahdi Saleh for titling the paper as EchoScene, which is vivid and catchy :)

ymxlzgy/echoscene