offset loss, angle loss are all zero

Question

offset loss, angle loss are all zero

Closed this issue 3 years ago · 14 comments

zye1996 commented 4 years ago

Hi, I am running the training process while all losses are zero except the vote_loss and cls_loss. Is there any setting I need to tweak in the config?

Answer 1 · 2021-01-07T18:12:05.000Z

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

Answer 2 · 2021-01-07T18:18:34.000Z

There may be problems in pmask. The sum of ones of pmask and nmask is not correct.

Answer 3 · 2021-01-07T18:20:03.000Z

@qiqihaer @zye1996
I have same issue.
Losses cannot be calculated because Pmask = 0

Answer 4 · 2021-01-07T22:56:53.000Z

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

I am not sure this is due to the bug in labeling or it is because all sampled points are out of objects and I am still in the process of find a solution to this. I will keep you updated. Please also let me know once you find any clues.

Answer 5 · 2021-01-10T03:23:23.000Z

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

get some progress here, changing line 244 in target_assigner.py to
pmask = torch.logical_and(pmask.unsqueeze(-1), dist_mask)
recovered losses.

After the model is trained, the evaluation results are not as expected:

Car AP@0.70, 0.70, 0.70:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:9.9624, 10.7949, 10.7949
3d   AP:3.4869, 9.0909, 9.0909
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:6.5877, 4.1919, 4.1919
3d   AP:1.1803, 0.8444, 0.8444
aos  AP:5.22, 4.16, 4.16
Car AP@0.70, 0.50, 0.50:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:18.2366, 14.7968, 14.7968
3d   AP:10.8402, 10.2320, 10.2320
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:13.5625, 9.3519, 9.3519
3d   AP:6.4555, 5.5059, 5.5059
aos  AP:5.22, 4.16, 4.16

Answer 6 · 2021-01-11T07:55:53.000Z

@zye1996
Thank you for your reply.
I'll check the original implementation of 3DSSD and try to find out what's wrong.

Answer 7 · 2021-01-14T03:37:13.000Z

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

get some progress here, changing line 244 in target_assigner.py to
pmask = torch.logical_and(pmask.unsqueeze(-1), dist_mask)
recovered losses.

After the model is trained, the evaluation results are not as expected:
Car AP@0.70, 0.70, 0.70:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:9.9624, 10.7949, 10.7949
3d   AP:3.4869, 9.0909, 9.0909
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:6.5877, 4.1919, 4.1919
3d   AP:1.1803, 0.8444, 0.8444
aos  AP:5.22, 4.16, 4.16
Car AP@0.70, 0.50, 0.50:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:18.2366, 14.7968, 14.7968
3d   AP:10.8402, 10.2320, 10.2320
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:13.5625, 9.3519, 9.3519
3d   AP:6.4555, 5.5059, 5.5059
aos  AP:5.22, 4.16, 4.16

Hi~Do you know the reason for this low AP value?

Answer 8 · 2021-01-14T03:53:50.000Z

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

get some progress here, changing line 244 in target_assigner.py to
pmask = torch.logical_and(pmask.unsqueeze(-1), dist_mask)
recovered losses.
After the model is trained, the evaluation results are not as expected:
Car AP@0.70, 0.70, 0.70:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:9.9624, 10.7949, 10.7949
3d   AP:3.4869, 9.0909, 9.0909
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:6.5877, 4.1919, 4.1919
3d   AP:1.1803, 0.8444, 0.8444
aos  AP:5.22, 4.16, 4.16
Car AP@0.70, 0.50, 0.50:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:18.2366, 14.7968, 14.7968
3d   AP:10.8402, 10.2320, 10.2320
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:13.5625, 9.3519, 9.3519
3d   AP:6.4555, 5.5059, 5.5059
aos  AP:5.22, 4.16, 4.16
Hi~Do you know the reason for this low AP value?

There are plenty of bugs existing in the repo, and I am trying to find out the reason. One bug I found so far is here where the last function should be

int furthest_point_sampling_with_dist_wrapper(int b, int n, int m, 
    at::Tensor points_tensor, at::Tensor temp_tensor, at::Tensor idx_tensor) {

    const float *points = points_tensor.data<float>();
    float *temp = temp_tensor.data<float>();
    int *idx = idx_tensor.data<int>();

    cudaStream_t stream = THCState_getCurrentStream(state);
    furthest_point_sampling_kernel_with_dist_launcher(b, n, m, points, temp, idx, stream);
    return 1;
}

My understanding is that the repo is pretty much unfinished so be careful.

Answer 9 · 2021-01-14T04:02:46.000Z

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

get some progress here, changing line 244 in target_assigner.py to
pmask = torch.logical_and(pmask.unsqueeze(-1), dist_mask)
recovered losses.
After the model is trained, the evaluation results are not as expected:
Car AP@0.70, 0.70, 0.70:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:9.9624, 10.7949, 10.7949
3d   AP:3.4869, 9.0909, 9.0909
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:6.5877, 4.1919, 4.1919
3d   AP:1.1803, 0.8444, 0.8444
aos  AP:5.22, 4.16, 4.16
Car AP@0.70, 0.50, 0.50:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:18.2366, 14.7968, 14.7968
3d   AP:10.8402, 10.2320, 10.2320
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:13.5625, 9.3519, 9.3519
3d   AP:6.4555, 5.5059, 5.5059
aos  AP:5.22, 4.16, 4.16
Hi~Do you know the reason for this low AP value?
There are plenty of bugs existing in the repo, and I am trying to find out the reason. One bug I found so far is here where the last function should be
int furthest_point_sampling_with_dist_wrapper(int b, int n, int m, 
    at::Tensor points_tensor, at::Tensor temp_tensor, at::Tensor idx_tensor) {

    const float *points = points_tensor.data<float>();
    float *temp = temp_tensor.data<float>();
    int *idx = idx_tensor.data<int>();

    cudaStream_t stream = THCState_getCurrentStream(state);
    furthest_point_sampling_kernel_with_dist_launcher(b, n, m, points, temp, idx, stream);
    return 1;
}
My understanding is that the repo is pretty much unfinished so be careful.

OK, thanks , I'll check the original implementation of 3DSSD and try to find out what's wrong.

Answer 10 · 2021-01-14T04:05:12.000Z

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

get some progress here, changing line 244 in target_assigner.py to
pmask = torch.logical_and(pmask.unsqueeze(-1), dist_mask)
recovered losses.
After the model is trained, the evaluation results are not as expected:
Car AP@0.70, 0.70, 0.70:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:9.9624, 10.7949, 10.7949
3d   AP:3.4869, 9.0909, 9.0909
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:6.5877, 4.1919, 4.1919
3d   AP:1.1803, 0.8444, 0.8444
aos  AP:5.22, 4.16, 4.16
Car AP@0.70, 0.50, 0.50:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:18.2366, 14.7968, 14.7968
3d   AP:10.8402, 10.2320, 10.2320
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:13.5625, 9.3519, 9.3519
3d   AP:6.4555, 5.5059, 5.5059
aos  AP:5.22, 4.16, 4.16
Hi~Do you know the reason for this low AP value?
There are plenty of bugs existing in the repo, and I am trying to find out the reason. One bug I found so far is here where the last function should be
int furthest_point_sampling_with_dist_wrapper(int b, int n, int m, 
    at::Tensor points_tensor, at::Tensor temp_tensor, at::Tensor idx_tensor) {

    const float *points = points_tensor.data<float>();
    float *temp = temp_tensor.data<float>();
    int *idx = idx_tensor.data<int>();

    cudaStream_t stream = THCState_getCurrentStream(state);
    furthest_point_sampling_kernel_with_dist_launcher(b, n, m, points, temp, idx, stream);
    return 1;
}
My understanding is that the repo is pretty much unfinished so be careful.
OK, thanks , I'll check the original implementation of 3DSSD and try to find out what's wrong.

many thanks if you can let me know what's going on with the rest. After I fixed the bug I mentioned above, the loss is unstable and fluctuating.

Answer 11 · 2021-01-21T17:03:18.000Z

made some progress here and I will update everything here, since pull request may not make sense if the author is not responding

Answer 12 · 2021-01-21T19:48:23.000Z

I reimplement 3DSSD with TensorFlow v2 here.

It might be easier to debug than Tensorflow v1.

Answer 13 · 2021-01-21T19:52:38.000Z

I cannot thank @shuto-keio enough😂

Answer 14 · 2021-01-23T04:59:37.000Z

I am getting this right now and still trying to fix some small problems:

Car AP@0.70, 0.70, 0.70:
bbox AP:88.3067, 87.4015, 87.4015
bev  AP:87.1555, 83.6703, 83.6703
3d   AP:81.8364, 75.6352, 75.6352
aos  AP:88.31, 87.40, 87.40
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:92.6265, 89.2507, 89.2507
bev  AP:89.0312, 84.9429, 84.9429
3d   AP:83.2703, 74.8559, 74.8559
aos  AP:92.63, 89.25, 89.25
Car AP@0.70, 0.50, 0.50:
bbox AP:88.3067, 87.4015, 87.4015
bev  AP:88.4671, 87.8447, 87.8447
3d   AP:88.4409, 87.7542, 87.7542
aos  AP:88.31, 87.40, 87.40
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:92.6265, 89.2507, 89.2507
bev  AP:92.9291, 91.6493, 91.6493
3d   AP:92.8714, 91.3892, 91.3892
aos  AP:92.63, 89.25, 89.25