3D-to-2D Distillation for Indoor Scene Parsing

Introduction

This repository is the implementation for 3D-to-2D Distillation for Indoor Scene Parsing in CVPR 2021 (Oral). The code is based on PSPNet.

Usage

  1. Requirement:

    • Hardware: 4 GPUs (better with >=11G GPU memory)
    • Software: PyTorch>=1.1.0, Python3, tensorboardX,
  2. Clone the repository:

    git clone https://github.com/liuzhengzhe/3D-to-2D-Distillation-for-Indoor-Scene-Parsing
  3. Train:

  4. Test:

    • Download [trained segmentation models] from google drive or baiduyun (extract code: 3sfq) and put them under folder specified in config or modify the specified paths.

    • For full testing (get listed performance):

      sh tool/test.sh scannet pspnet50
  5. Generate 3D features from other 3D semantic segmentation models

    • Run 3D semantic segmentation model and save the features in the "data/feat" folder. Each file contains a feature array for one point cloud with shape N*d, where N is the number of points and d is the dimension of the 3D feature. The order of features should follow the order of points of the ScanNet .ply file.

      cd data/
      python proj3dto2d_1.py
      python proj3dto2d_2.py

      Then you can train the model with other custom 3D feature.

Performance

Description: mIoU/mAcc/aAcc stands for mean IoU, mean accuracy of each class and all pixel accuracy respectively.

ScanNet-v2:

  • Setting: train on train set and test on val set.
Backbone mIoU/mAcc/pAcc(ms)
PSPNet50 0.5822/0.7083/0.8170.