Lower results when evaluating released BEVDet checkpoint

Question

Lower results when evaluating released BEVDet checkpoint

Closed this issue 2 years ago · 11 comments

Hello, I have tried to evaluate released BEVDet checkpoint as-is on my setup, but I get

mAP: 0.2751                                                                                                                                                                                   
mATE: 0.7179
mASE: 0.2738
mAOE: 0.5512
mAVE: 0.8747
mAAE: 0.2205
NDS: 0.3737
Eval time: 107.4s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.441   0.631   0.167   0.131   1.037   0.254
truck   0.197   0.757   0.225   0.125   0.828   0.227
bus     0.283   0.680   0.185   0.139   1.895   0.350
trailer 0.132   1.053   0.224   0.463   0.547   0.068
construction_vehicle    0.066   0.795   0.484   1.174   0.095   0.358
pedestrian      0.301   0.788   0.305   1.320   0.848   0.412
motorcycle      0.235   0.704   0.262   0.612   1.437   0.090
bicycle 0.182   0.607   0.265   0.875   0.310   0.006
traffic_cone    0.445   0.616   0.333   nan     nan     nan
barrier 0.468   0.547   0.287   0.122   nan     nan

which is lower than the expected 30.8/40.4 mAP/NDS.

I am using A6000 GPUs, torch 1.10.1, cudatoolkit 11.3. Do you know what might be the issue?

I find that I have the exact same numbers as #15 @BoLang615, but I believe I am using the latest version. I would appreciate any pointers for this.

Thank you!

Answer 1 · 2022-07-13T10:06:40.000Z

@Divadi you train this with 4 gpus, total 8x4=32 batch size and lr=1e-4?

Answer 2 · 2022-07-13T10:08:00.000Z

This is without re-training; I just loaded & evaluated the released checkpoint.

I'm running a separate training job with 4 gpus, 16x4=64 batch size, original lr, but it has not completed yet.

Answer 3 · 2022-07-13T10:46:32.000Z

@Divadi It seems a common problem that needs checking the numerical consistency of the intermediate result. I store some intermediate results of the first sample with a pickle (python3.7).

check.zip

Answer 4 · 2022-07-13T13:44:49.000Z

Hmm...

When I load your pkl and compare it with mine:

>>> a = pickle.load(open("check.pkl", 'rb')); b = pickle.load(open("check_divadi.pkl", 'rb'))
>>> a.keys()
dict_keys(['points', 'pred_bboxes', 'out_dir', 'file_name', 'bbox_pts', 'img_metas'])
>>> a['file_name']
'n015-2018-07-11-11-54-16+0800__LIDAR_TOP__1531281629949213'
>>> b['img_metas'][0]['pts_filename']
'datasets/nuscenes/samples/LIDAR_TOP/n015-2018-07-11-11-54-16+0800__LIDAR_TOP__1531281439800013.pcd.bin'

The first file path itself is different; the predictions are different as well. Is what you sent me the first sample as loaded by the pipeline?
Also, for reference I saw that nuscenes_converter was not different from mmdetection3d's pre-coordinate change version, so I had just used those pkl files.

Answer 5 · 2022-07-13T15:20:50.000Z

I set the workers_per_gpu=0
Here is the md5sum of my test pkl, you can check this as well:
efd90b7e93c43fc18e98a0cf0ec8b1c4 /nuscenes_infos_val.pkl

Answer 6 · 2022-07-13T15:42:19.000Z

emm, I apologize for my mistaken 'test.pkl' for 'check.pkl' and 'img_feats' for 'img_metas'
here is the modified pkl:
check.zip.zip

Answer 7 · 2022-07-13T15:46:08.000Z

I will check the pkl & zip further when I get home.

The results of training myself are as follows:

mAP: 0.3050                                                                                                                                                                                                 
mATE: 0.6869
mASE: 0.2754
mAOE: 0.5599
mAVE: 0.8782
mAAE: 0.2481
NDS: 0.3876
Eval time: 120.7s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.503   0.542   0.160   0.109   0.929   0.228
truck   0.209   0.721   0.224   0.172   0.813   0.228
bus     0.300   0.731   0.188   0.093   1.747   0.440
trailer 0.170   1.048   0.242   0.385   0.617   0.112
construction_vehicle    0.055   0.894   0.485   1.118   0.106   0.392
pedestrian      0.325   0.743   0.302   1.343   0.861   0.495
motorcycle      0.262   0.678   0.259   0.670   1.680   0.075
bicycle 0.218   0.544   0.275   1.030   0.272   0.015
traffic_cone    0.503   0.501   0.332   nan     nan     nan
barrier 0.506   0.468   0.288   0.119   nan     nan

Answer 8 · 2022-07-13T15:50:46.000Z

@Divadi mAVE and mAAE is a bit low. Some 'abnormal' examples (I think the others will not report their result when it is seem ok- - ) can be found in issue#21.

Answer 9 · 2022-07-14T03:00:38.000Z

I will check the pkl & zip further when I get home.

The results of training myself are as follows:

mAP: 0.3050                                                                                                                                                                                                 
mATE: 0.6869
mASE: 0.2754
mAOE: 0.5599
mAVE: 0.8782
mAAE: 0.2481
NDS: 0.3876
Eval time: 120.7s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.503   0.542   0.160   0.109   0.929   0.228
truck   0.209   0.721   0.224   0.172   0.813   0.228
bus     0.300   0.731   0.188   0.093   1.747   0.440
trailer 0.170   1.048   0.242   0.385   0.617   0.112
construction_vehicle    0.055   0.894   0.485   1.118   0.106   0.392
pedestrian      0.325   0.743   0.302   1.343   0.861   0.495
motorcycle      0.262   0.678   0.259   0.670   1.680   0.075
bicycle 0.218   0.544   0.275   1.030   0.272   0.015
traffic_cone    0.503   0.501   0.332   nan     nan     nan
barrier 0.506   0.468   0.288   0.119   nan     nan

may be epoch18 is better……

Answer 10 · 2022-07-16T12:19:18.000Z

@HuangJunJie2017
Whew... I think I found the issue; I had Pillow 9.2.0 installed, probably causing some of the operations in image transforms (loading.py) to be slightly different from your Pillow 8.4.0. As a consequence, your loaded images' differences with mine looked like this:

After downgrading to Pillow 8.4.0, the difference is nil:

Updated results:

mAP: 0.3082
mATE: 0.6648
mASE: 0.2729
mAOE: 0.5330
mAVE: 0.8287
mAAE: 0.2052
NDS: 0.4036
Eval time: 98.1s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.508   0.535   0.159   0.127   0.947   0.232
truck   0.222   0.671   0.216   0.123   0.834   0.220
bus     0.311   0.760   0.195   0.086   1.592   0.301
trailer 0.150   0.987   0.229   0.443   0.518   0.054
construction_vehicle    0.073   0.720   0.482   1.093   0.103   0.342
pedestrian      0.336   0.738   0.301   1.326   0.861   0.409
motorcycle      0.262   0.704   0.262   0.595   1.450   0.075
bicycle 0.213   0.525   0.270   0.885   0.325   0.009
traffic_cone    0.506   0.518   0.331   nan     nan     nan
barrier 0.502   0.490   0.284   0.119   nan     nan

Thank you for your help!

Answer 11 · 2022-07-16T13:10:27.000Z

@Divadi nice job! thank you so much for your information!