/ScenePriors

Implementation of CVPR'23: Learning 3D Scene Priors with 2D Supervision

Primary LanguagePythonMIT LicenseMIT

Learning 3D Scene Priors with 2D Supervision [Project][Paper]

Yinyu Nie, Angela Dai, Xiaoguang Han, Matthias Nießner

in CVPR 2023


3D Scene Generation

Single View Reconstruction

Input Pred Input Pred

Install

Our codebase is developed under Ubuntu 20.04 with PyTorch 1.12.1.

  1. We recommend to use conda to deploy our environment by

    cd ScenePriors
    conda create env -f environment.yml
    conda activate sceneprior
    
  2. Install Fast Transformers by

    cd external/fast_transformers
    python setup.py build_ext --inplace
    cd ../..
    
  3. Please follow link to install the prerequisite libraries for PyTorch3D. Then install PyTorch3D from our local clone by

    cd external/pytorch3d
    pip install -e .
    cd ../..
    

    Note: After installed all prerequisite libraries in link, please do not install prebuilt binaries for PyTorch3D.


Data Processing

3D-Front data processing (for scene genration)

  1. Apply & Download the 3D-Front dataset and link them to the local directory as follows:

    datasets/3D-Front/3D-FRONT
    datasets/3D-Front/3D-FRONT-texture
    datasets/3D-Front/3D-FUTURE-model
    
  2. Render 3D-Front scenes following my rendering pipeline and link the rendering results (in renderings folder) to

    datasets/3D-Front/3D-FRONT_renderings_improved_mat
    

    Note: you can comment out bproc.renderer.enable_depth_output(activate_antialiasing=False) in render_dataset_improved_mat.py since we do not need depth information.

  3. Preprocess 3D-Front data by

    python utils/threed_front/1_process_viewdata.py --room_type ROOM_TYPE --n_processes NUM_THREADS
    python utils/threed_front/2_get_stats.py --room_type ROOM_TYPE
    
    • The processed data for training are saved in datasets/3D-Front/3D-FRONT_samples.
    • We also parsed and extracted the 3D-Front data for visualization into datasets/3D-Front/3D-FRONT_scenes.
    • ROOM_TYPE can be 'bed'(bedroom) or 'living'(living room).
    • You can set NUM_THREADS to your CPU core number for parallel processing.
  4. Visualize processed data for verification by (optional)

    python utils/threed_front/vis/vis_gt_sample.py --scene_json SCENE_JSON_ID --room_id ROOM_ID --n_samples N_VIEWS 
    
    • SCENE_JSON_ID is the ID of a scene, e,g, 6a0e73bc-d0c4-4a38-bfb6-e083ce05ebe9.
    • ROOM_ID is the room ID in this scene, e.g., MasterBedroom-2679.
    • N_VIEWS is the number views to visualize., e.g. 12.

    If everything goes smooth, there will pop five visualization windows as follows.

RGB
Semantics
Instances
3D Box Projections
CAD Models (view #1)

Note: X server is required for visualization.

ScanNet data processing (for single-view reconstruction)

  1. Apply and download ScanNet into datasets/ScanNet/scans. Since we need 2D data, the *.sens should also be downloaded for each scene.

  2. Extract *.sens files to obtain RGB/semantics/instance/camera pose frame data by

    python utils/scannet/1_unzip_sens.py
    

    Then the folder structure in each scene looks like:

    ./scene*
    |--color (folder)
    |--instance-filt (folder)
    |--intrinsic (folder)
    |--label-filt (folder)
    |--pose (folder)
    |--scene*.aggregation.json
    |--scene*.sens
    |--scene*.txt
    |--scene*_2d-instance.zip
    ...
    |--scene*_vh_clean_2.ply
    
  3. Process ScanNet data by

    python utils/scannet/2_process_viewdata.py
    

    The processed data will be saved in datasets/ScanNet/ScanNet_samples.

  4. Visualize the processed data by(optional)

    python utils/scannet/vis/vis_gt.py --scene_id SCENE_ID --n_samples N_SAMPLES
    
    • SCENE_ID is the scene ID in scannet, e.g., scene0000_00
    • N_SAMPLES is the number of views to visualize, e.g., 6

    If everything goes smooth, it will pop out five visualization windows like

RGB
Semantics
Instances
3D Box Projections
3D Boxes (view #3)

Note: X server is required for visualization.


Training

Note: we use SLURM to manage multi-GPU training. For backend setting, please check slurm_jobs.

Scene Generation (with 3D-Front)

Here we use bedroom data as an example. Training on living rooms is the same.

  1. Start layout pretraining by
    python main.py \
        start_deform=False \
        resume=False \
        finetune=False \
        weight=[] \
        distributed.num_gpus=4 \
        data.dataset=3D-Front \
        data.split_type=bed \
        data.n_views=20 \
        data.aug=False \
        device.num_workers=32 \
        train.batch_size=128 \
        train.epochs=800 \
        train.freeze=[] \
        scheduler.latent_input.milestones=[400] \
        scheduler.generator.milestones=[400] \
        log.if_wandb=True \
        exp_name=pretrain_3dfront_bedroom
    
    The network weight will be saved in outputs/3D-Front/train/YEAR-MONTH-DAY/HOUR-MINUTE-SECOND/model_best.pth.
  2. Shape training - We start shape training after the layout training converged. Please replace the weight keyword below with the pretrained weight path.
    python main.py \
        start_deform=True \
        resume=False \
        finetune=True \
        weight=['outputs/3D-Front/train/YEAR-MONTH-DAY/HOUR-MINITE-SECOND/model_best.pth'] \
        distributed.num_gpus=4 \
        data.dataset=3D-Front \
        data.n_views=20 \
        data.aug=False \
        data.downsample_ratio=4 \
        device.num_workers=16 \
        train.batch_size=16 \
        train.epochs=500 \
        train.freeze=[] \
        scheduler.latent_input.milestones=[300] \
        scheduler.generator.milestones=[300] \
        log.if_wandb=True \
        exp_name=train_3dfront_bedroom
    
    Still, the refined network weight will be saved in outputs/3D-Front/train/YEAR-MONTH-DAY/HOUR-MINUTE-SECOND/model_best.pth.

Single-view Reconstruction (with ScanNet)

  1. Start layout pretraining by

    python main.py \
        start_deform=False \
        resume=False \
        finetune=False \
        weight=[] \
        distributed.num_gpus=4 \
        data.dataset=ScanNet \
        data.split_type=all \
        data.n_views=40 \
        data.aug=True \
        device.num_workers=32 \
        train.batch_size=64 \
        train.epochs=500 \
        train.freeze=[] \
        scheduler.latent_input.milestones=[500] \
        scheduler.generator.milestones=[500] \
        log.if_wandb=True \
        exp_name=pretrain_scannet
    

    The network weight will be saved in outputs/ScanNet/train/YEAR-MONTH-DAY/HOUR-MINUTE-SECOND/model_best.pth.

  2. Shape training - We start shape training after the layout training converged. Please replace the weight keyword below with the pretrained weight path.

   python main.py \
       start_deform=True \
       resume=False \
       finetune=True \
       weight=['outputs/ScanNet/train/YEAR-MONTH-DAY/HOUR-MINUTE-SECOND/model_best.pth'] \
       distributed.num_gpus=4 \
       data.dataset=ScanNet \
       data.split_type=all \
       data.n_views=40 \
       data.downsample_ratio=4 \
       data.aug=True \
       device.num_workers=8 \
       train.batch_size=8 \
       train.epochs=500 \
       train.freeze=[] \
       scheduler.latent_input.milestones=[300] \
       scheduler.generator.milestones=[300] \
       log.if_wandb=True \
       exp_name=train_scannet

Still, the refined network weight will be saved in outputs/ScanNet/train/YEAR-MONTH-DAY/HOUR-MINUTE-SECOND/model_best.pth.


Generation & Reconstruction

Please replace the keyword weight below with your trained weight path.

  1. Scene Generation (with 3D-Front)

    python main.py \
       mode=generation \
       start_deform=True \
       data.dataset=3D-Front \
       finetune=True \
       weight=outputs/ScanNet/train/YEAR-MONTH-DAY/HOUR-MINUTE-SECOND/model_best.pth \
       generation.room_type=bed \
       data.split_dir=splits \
       data.split_type=bed \
       generation.phase=generation
    

    The generated scenes will be saved in outputs/3D-Front/generation/YEAR-MONTH-DAY/HOUR-MINUTE-SECOND.

  2. Single view reconstruction (with ScanNet). Since this process involves test-time optimization, it would be very slow. Here we test in parallel by dividing the whole test set into batch_num batches. You should run this script multiple times to finish the whole testing, where for each script, you should set an individual batch_id number, batch_id=0,1,...,batch_num-1. If you not want to run in parallel, you can keep the default setting as below.

    python main.py \
        mode=demo \
        start_deform=True \
        finetune=True \
        data.n_views=1 \
        data.dataset=ScanNet \
        data.split_type=all \
        weight=outputs/ScanNet/train/YEAR-MONTH-DAY/HOUR-MINUTE-SECOND/model_best.pth \
        optimizer.method=RMSprop \
        optimizer.lr=0.01 \
        scheduler.latent_input.milestones=[1200] \
        scheduler.latent_input.gamma=0.1 \
        demo.epochs=2000 \
        demo.batch_id=0 \
        demo.batch_num=1 \
        log.print_step=100 \
        log.if_wandb=False \
    

    The results will be saved in outputs/ScanNet/demo/output.

  3. Similarly, you can do single view reconstruction with 3D-Front as well.

    python main.py \
        mode=demo \
        start_deform=True \
        finetune=True \
        data.n_views=1 \
        data.dataset=3D-Front \
        data.split_type=bed \
        weight=outputs/3D-Front/train/2022-09-06/02-37-24/model_best.pth \
        optimizer.method=RMSprop \
        optimizer.lr=0.01 \
        scheduler.latent_input.milestones=[1200] \
        scheduler.latent_input.gamma=0.1 \
        demo.epochs=2000
        demo.batch_id=0 \
        demo.batch_num=1 \
        log.print_step=100 \
        log.if_wandb=False
    

    The results will be saved in outputs/3D-Front/demo/output.


Visualization

Note: you may need X-server to showcase the visualization windows from VTK.

  1. Scene Generation (with 3D-Front).

    python utils/threed_front/vis/render_pred.py --pred_file outputs/3D-Front/generation/YEAR-MONTH-DAY/HOUR-MINUTE-SECOND/vis/bed/sample_X_X.npz [--use_retrieval]
    
  2. Single-view Reconstruction (with ScanNet)

    python utils/scannet/vis/vis_prediction_scannet.py --dump_dir demo/ScanNet/output --sample_name all_sceneXXXX_XX_XXXX 
    
  3. Single-view Reconstruction (with 3D-Front)

    python utils/threed_front/vis/vis_svr.py --dump_dir demo/3D-Front/output --sample_name [FILENAME IN dump_dir]
    

Citation

If you find our work is helpful, please cite

@InProceedings{Nie_2023_CVPR,
    author    = {Nie, Yinyu and Dai, Angela and Han, Xiaoguang and Nie{\ss}ner, Matthias},
    title     = {Learning 3D Scene Priors With 2D Supervision},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {792-802}
}