Capturing finer details in the scene

Question

Capturing finer details in the scene

amundra15 opened this issue a year ago · 12 comments

Hi,

Thanks for releasing your amazing work. I was curious about the ability of the approach to capture finer details in the scene. Since the original resolution of 200x200x16 is quite limiting, I want to try different ways to modify it:

Trying a higher spatial resolution only at inference time doesn't produce any error, but the output looks just an enlarged version of the original one, with the same number of occupied voxels as the original results. Could you shed some light here?
I assume the correct way to capture more details would be to increase the resolution during training, and also supervise it with denser occupancy labels. Have you already released the dense mesh with semantic labels?

Best,
Akshay

Answer 1 · 2023-10-19T10:05:44.000Z

Hi Akshay,
Yeah, I think only increasing the resolution during inference cannot work well. The possible reason is that the 3D convolution layer fits the low-resolution volumes and the performance will drop if you directly put it on high-resolution ones. The right way is to increase the resolution during training. However, we do not try it since our RTX 3090 cannot obtain the model with higher resolution.

Answer 2 · 2023-10-19T10:18:09.000Z

Thanks for your response. Could you tell me where can I find the dense mesh with semantic labels to generate higher-resolution occupancy labels?

Answer 3 · 2023-10-19T14:09:36.000Z

The mesh vertices in this README are the dense mesh GT and you can subsample it to different resolutions.

Answer 4 · 2023-10-20T10:15:09.000Z

Great, thanks.

Can I also ask for the model weights of TPVFormer trained on the densified GT labels (i.e. TPVFormer* from the paper)? It would be interesting to see its performance for higher resolution inference.

Answer 5 · 2023-10-23T05:55:47.000Z

Hi, here is the weight and config of TPVFormer*. Note that we change the resolution as 200x200x16 and the feature dimension as 64 to fit RTX3090, which is mentioned in supp.

Answer 6 · 2023-10-23T09:32:00.000Z

Thanks for the model!

I tried running the inference with it, but am facing run-time errors as the config variables are not matching the codebases (I tried using the SurroundOcc as well as the TPVFormer codebases). It seems the config file is to be run with a codebase which is not exactly the same as either of these.

Do you have any guidance for me on how to run inference with this config file?

Answer 7 · 2023-10-25T03:37:08.000Z

Hi, this may because that we use an early version of tpvformer and their code has been modified as a new version. We upload the old version here but we do not clean the code. Another way is that you can use our dataset to train their model with the official code (new version).

Answer 8 · 2023-10-25T12:44:48.000Z

I was able to run this model successfully. Thanks for your support!

Answer 9 · 2023-11-08T09:52:40.000Z

Hi @weiyithu ,

I have another related question. How is your processed data different from the original NuScenes occupancy labels? I noticed that using your data with the TPVFormer code doesn't raise errors but learns mostly empty occupancies. One issue I figured was the scale of occupancy labels (metric in the original data, voxel indexed in your case) -- and I am trying to use the code here to fix that. Could you highlight any other items I might need to handle to make it work with the TPVFormer code?

Answer 10 · 2024-08-29T08:46:37.000Z

@weiyithu @amundra15
How do you use the nescenes_occ 200x200x16 dataset to run the inference in tpv_code?
Because I will encounter this error.
!!!!!!!!!!!!!!!!!!!! scene_2ed0fcbfc214478ca3b3ce013e7723ba/dense_voxels_with_semantic/cb035615437f4d428db0a3b1edb4796a.npy

I think the error from tpv_code/dataloader/dataset.py line45~line63

But the file name's type in nescenes_occ 200x200x16 is
n008-2018-05-21-11-06-59-0400__LIDAR_TOP__1526915243047392.pcd.bin.npy

Looking forward to your answer,thank you.

Answer 11 · 2024-09-05T17:30:02.000Z

@weiyithu @amundra15 How do you use the nescenes_occ 200x200x16 dataset to run the inference in tpv_code? Because I will encounter this error. !!!!!!!!!!!!!!!!!!!! scene_2ed0fcbfc214478ca3b3ce013e7723ba/dense_voxels_with_semantic/cb035615437f4d428db0a3b1edb4796a.npy

I think the error from tpv_code/dataloader/dataset.py line45~line63

But the file name's type in nescenes_occ 200x200x16 is n008-2018-05-21-11-06-59-0400__LIDAR_TOP__1526915243047392.pcd.bin.npy

Looking forward to your answer,thank you.

try this path: rel_path = '{0}.npy'.format(info['lidar_path'].split('/')[-1])

Answer 12 · 2024-09-06T03:26:37.000Z

@jichengyuan
Thank you for your reply.
It could work ,but their eval.py output_data wasn't accurate.
Just like this picture.

But I saw paper show this

I think the paper use another method to get eval_data like this https://opendrivelab.com/challenge2023/ or another reason cause this result.

Another problem is that the image visualized by the epoch.pth I got after training is a bit strange.

The length-to-width-to-height ratio is 64:16:6.

Looking forward to your answer,thank you.