mit-han-lab/bevfusion

Checking if bev_pool is compiled properly

Divadi opened this issue · 12 comments

Hello, thank you for releasing the code.

I was trying to use bev_pool in other projects, but I found that my compilation of bev_pool doesn't seem to be yielding expected results. For a toy example:

device = "cuda:4"
bev_pool(
    torch.tensor([[5.0]], device=device),
    torch.tensor([[0, 0, 0, 0]], device=device),
    1, torch.tensor(1, device=device), torch.tensor(1, device=device), torch.tensor(1, device=device))

the output is

tensor([[[[[0.]]]]], device='cuda:4')

when I would expect it to be 5.0.

Please let me know if I have incorrectly used the function.
My environment is PyTorch 1.10.1, cudatoolkit 11.3.1, A6000 GPU.

Thank you!

Actually, it seems my compilation is okay; evaluating the camera-only baseline yields:

mAP: 0.3151                                                                                                                                                                                               
mATE: 0.7155
mASE: 0.2742
mAOE: 0.5419
mAVE: 0.8821
mAAE: 0.2595
NDS: 0.3902
Eval time: 92.3s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.498   0.570   0.161   0.127   0.989   0.241
truck   0.265   0.737   0.210   0.142   0.838   0.233
bus     0.341   0.728   0.197   0.083   1.578   0.299
trailer 0.147   0.970   0.232   0.529   0.659   0.062
construction_vehicle    0.076   0.955   0.487   1.043   0.106   0.391
pedestrian      0.348   0.748   0.304   1.388   0.863   0.755
motorcycle      0.272   0.720   0.260   0.557   1.620   0.084
bicycle 0.215   0.597   0.271   0.868   0.403   0.010
traffic_cone    0.495   0.593   0.332   nan     nan     nan
barrier 0.495   0.537   0.287   0.139   nan     nan

which is lower than expected (mAP 33.25, NDS 40.15) but still non-trivial.

Is my usage incorrect by any chance?

That's quite interesting. I actually did not test bev_pool on small toy examples, I instead just integrated it into our pipeline and train the entire network, so there might be some boundary cases that I made some mistakes during the implementation.

Regarding the evaluation results, may I ask how many GPUs are you using? I also think the compilation should be correct, but such an accuracy drop looks unexpected to me.

Evaluating is using 4 GPUs.

Actually, bev_pool is being really strange for me. When used as part of the pipeline, it yields reasonable results. So, I tried adding

import pickle
pickle.dump([feats, coords, B, D, H, W, x], open(PICKLE_PATH, 'wb+'))
assert False

right after

x = x.permute(0, 4, 1, 2, 3).contiguous()

Then, I made another file loading the pickle results

import torch
from mmdet3d.ops import bev_pool
import pickle

def load_pickle(f):
    return pickle.load(open(f, 'rb'))

feats, coords, B, D, H, W, x = load_pickle(PICKLE_PATH)
k = bev_pool(feats, coords, B, D, H, W)

print((k != 0).sum(), (x != 0).sum())

And for some reason, the results are different!

tensor(0, device='cuda:2') tensor(4805600, device='cuda:2')

I've never had this issue with cuda operations before, and I'm not quite sure how to go about debugging this issue since it clearly works as part of the entire pipeline but not on its own

Another detail: when I paste the toy example

device = x.device
a = bev_pool(
     torch.tensor([[5.0]], device=device),
     torch.tensor([[0, 0, 0, 0]], device=device),
        1, torch.tensor(1, device=device), torch.tensor(1, device=device), torch.tensor(1, device=device))
print(a)
assert False

and run it as part of the pipeline by pasting it after this line

x = bev_pool(x, geom_feats, B, self.nx[2], self.nx[0], self.nx[1])

the correct result is printed.

Is it possible that there's something wrong with my installation?

I'm still working on that. Will get back to you once I finished investigating this issue.

Hi @Divadi,

I looked into this issue recently. Would you mind trying out

CUDA_VISIBLE_DEVICES=4 python [your script].py

and modify the device to cuda:0? Besides, I've pushed a new commit to the repo, would you mind also trying out the latest version?

Best,
Haotian

By the way, for multi-gpu evaluation, would you mind also exploring these two directions?

  • First, let's see whether things work out if you use all the available GPUs on your machine. I would assume that your machine has >4 GPUs because you have cuda:4.

  • Second, let's see whether the results are correct if you evaluate with only one GPU.

Hi @Divadi,

I looked into this issue recently. Would you mind trying out
`

CUDA_VISIBLE_DEVICES=4 python [your script].py

and modify the device to cuda:0? Besides, I've pushed a new commit to the repo, would you mind also trying out the latest version?

Best, Haotian

Before the change, with the toy example above:

$ CUDA_VISIBLE_DEVICES=4 python tools/tmp.py 
tensor([[[[[5.]]]]], device='cuda:0')
$ python tools/tmp.py 
tensor([[[[[0.]]]]], device='cuda:4')

After the change:

$ CUDA_VISIBLE_DEVICES=4 python tools/tmp.py 
tensor([[[[[5.]]]]], device='cuda:0')
$ python tools/tmp.py 
tensor([[[[[5.]]]]], device='cuda:4')

Seems like that was the issue; really odd, but good catch!

By the way, for multi-gpu evaluation, would you mind also exploring these two directions?

  • First, let's see whether things work out if you use all the available GPUs on your machine. I would assume that your machine has >4 GPUs because you have cuda:4.
  • Second, let's see whether the results are correct if you evaluate with only one GPU.

I'll look into this soon, need a bit of time

@kentang-mit

By the way, for multi-gpu evaluation, would you mind also exploring these two directions?

  • First, let's see whether things work out if you use all the available GPUs on your machine. I would assume that your machine has >4 GPUs because you have cuda:4.
  • Second, let's see whether the results are correct if you evaluate with only one GPU.

When evaluating with just one GPU or all GPUs, results are same as before.

Thanks for the update. I'll investigate that.

@kentang-mit
Hi, I have addressed the issue. The problem was my installation had Pillow 9.2.0, while the repository requires 8.4.0 to function properly. More details can be found
HuangJunJie2017/BEVDet#41

I think Pillow 8.4.0 should be listed as an important requirement (sorry if I missed it).

New results:

mAP: 0.3325
mATE: 0.6828
mASE: 0.2717
mAOE: 0.5379
mAVE: 0.9040
mAAE: 0.2505
NDS: 0.4015
Eval time: 89.4s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.523   0.541   0.159   0.124   0.969   0.225
truck   0.280   0.704   0.208   0.131   0.911   0.233
bus     0.353   0.681   0.191   0.084   1.559   0.296
trailer 0.167   0.985   0.233   0.504   0.660   0.052
construction_vehicle    0.082   0.859   0.481   1.056   0.121   0.364
pedestrian      0.367   0.724   0.303   1.393   0.863   0.753
motorcycle      0.296   0.721   0.256   0.547   1.768   0.073
bicycle 0.237   0.577   0.270   0.862   0.382   0.007
traffic_cone    0.517   0.524   0.332   nan     nan     nan
barrier 0.503   0.513   0.284   0.140   nan     nan

Thank you for the very important hint. I'll add that to the README immediately!