xuxw98/ESAM

Issue with Visualization Demo

Closed this issue · 2 comments

Hi,

Thanks for the awesome work. I am trying to visualize the output of ESAM on scene0000_00 from Scannet using

CUDA_VISIBLE_DEVICES=0 python vis_demo/online_demo.py --scene-idx scene0000_00 --config configs/ESAM/ESAM_sv_scannet.py --checkpoint work_dirs/ESAM_sv_scannet/epoch_128.pth

However, when my data is fed through the test_pipeline with the scannet_sv config, it expects inputs like pts_instance_mask to be strings whereas the info dict in online_demo.py defines pts_instance_masks to be a list. So while I can tweak the code to visualize a single image, I'm having trouble getting the demo to run on an entire scene.

My file structure is below if this helps:
scannet-sv
├── 2D
│ └── scene0000_00
│ │ └── color
│ │ │ ├── 0.jpg
│ │ │ ├── 20.jpg
│ │ │ ├── ...
│ │ └── ...
├── 3D
│ └── ...
├── axis_align_matrix
│ └── scene0000_00.npy
├── instance_mask
│ ├── scene0000_00_0.bin
│ ├── ...
├── load_scannet_data.py
├── load_scannet_sv_data_v2_fast.py
├── load_scannet_sv_data_v2.py
├── meta_data
│ ├── generate_sv_txt.py
│ ├── scannet_means.npz
│ ├── scannet_train.txt
│ ├── scannetv2-labels.combined.tsv
│ ├── scannetv2_sv_train.txt
│ ├── scannetv2_sv_val.txt
│ ├── scannetv2_test.txt
│ ├── scannetv2_train.txt
│ └── scannetv2_val.txt
├── points
│ ├── scene0000_00_0.bin
│ ├── ...
├── pose_centered
│ └── scene0000_00
│ │ ├── 0.npy
│ │ ├── ...
├── README.md
├── scannet_sv_instance_data
│ ├── scene0000_00_0_axis_align_matrix.npy
│ ├── scene0000_00_0_ins_label.npy
│ ├── scene0000_00_0_sem_label.npy
│ ├── scene0000_00_0_sp_label.npy
│ ├── scene0000_00_0_vert.npy
│ ├── ...
├── scannet_sv_oneformer3d_infos_train.pkl
├── scannet_sv_oneformer3d_infos_val.pkl
├── scannet_utils.py
├── semantic_mask
│ ├── scene0000_00_0.bin
│ ├── ...
└── super_points
│ ├── scene0000_00_0.bin
│ ├── ...

Hello! Our online demo is only compatible with MV data, where MV stands for multi-view. The demo processes MV data in a streaming manner, inputting it frame by frame for online processing to produce visualization results. On the other hand, SV refers to single view, where this type of data is handled as individual RGB-D frames for training the SV stage model. Please ensure that you correctly process the scannet-mv data, and then use the command CUDA_VISIBLE_DEVICES=0 python vis_demo/online_demo.py --scene-idx scene0000_00 --config configs/ESAM/ESAM_online_scannet.py --checkpoint work_dirs/ESAM_online_scannet/epoch_128.pth to achieve visualization.

That makes sense. Visualization works for me now, thank you!