qiqihaer/3DSSD-pytorch

offset loss, angle loss are all zero

Closed this issue · 14 comments

Hi, I am running the training process while all losses are zero except the vote_loss and cls_loss. Is there any setting I need to tweak in the config?

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

There may be problems in pmask. The sum of ones of pmask and nmask is not correct.

@qiqihaer @zye1996
I have same issue.
Losses cannot be calculated because Pmask = 0

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

I am not sure this is due to the bug in labeling or it is because all sampled points are out of objects and I am still in the process of find a solution to this. I will keep you updated. Please also let me know once you find any clues.

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

get some progress here, changing line 244 in target_assigner.py to
pmask = torch.logical_and(pmask.unsqueeze(-1), dist_mask)
recovered losses.

After the model is trained, the evaluation results are not as expected:

Car AP@0.70, 0.70, 0.70:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:9.9624, 10.7949, 10.7949
3d   AP:3.4869, 9.0909, 9.0909
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:6.5877, 4.1919, 4.1919
3d   AP:1.1803, 0.8444, 0.8444
aos  AP:5.22, 4.16, 4.16
Car AP@0.70, 0.50, 0.50:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:18.2366, 14.7968, 14.7968
3d   AP:10.8402, 10.2320, 10.2320
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:13.5625, 9.3519, 9.3519
3d   AP:6.4555, 5.5059, 5.5059
aos  AP:5.22, 4.16, 4.16

@zye1996
Thank you for your reply.
I'll check the original implementation of 3DSSD and try to find out what's wrong.

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

get some progress here, changing line 244 in target_assigner.py to
pmask = torch.logical_and(pmask.unsqueeze(-1), dist_mask)
recovered losses.

After the model is trained, the evaluation results are not as expected:

Car AP@0.70, 0.70, 0.70:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:9.9624, 10.7949, 10.7949
3d   AP:3.4869, 9.0909, 9.0909
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:6.5877, 4.1919, 4.1919
3d   AP:1.1803, 0.8444, 0.8444
aos  AP:5.22, 4.16, 4.16
Car AP@0.70, 0.50, 0.50:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:18.2366, 14.7968, 14.7968
3d   AP:10.8402, 10.2320, 10.2320
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:13.5625, 9.3519, 9.3519
3d   AP:6.4555, 5.5059, 5.5059
aos  AP:5.22, 4.16, 4.16

Hi~Do you know the reason for this low AP value?

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

get some progress here, changing line 244 in target_assigner.py to
pmask = torch.logical_and(pmask.unsqueeze(-1), dist_mask)
recovered losses.
After the model is trained, the evaluation results are not as expected:

Car AP@0.70, 0.70, 0.70:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:9.9624, 10.7949, 10.7949
3d   AP:3.4869, 9.0909, 9.0909
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:6.5877, 4.1919, 4.1919
3d   AP:1.1803, 0.8444, 0.8444
aos  AP:5.22, 4.16, 4.16
Car AP@0.70, 0.50, 0.50:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:18.2366, 14.7968, 14.7968
3d   AP:10.8402, 10.2320, 10.2320
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:13.5625, 9.3519, 9.3519
3d   AP:6.4555, 5.5059, 5.5059
aos  AP:5.22, 4.16, 4.16

Hi~Do you know the reason for this low AP value?

There are plenty of bugs existing in the repo, and I am trying to find out the reason. One bug I found so far is here where the last function should be

int furthest_point_sampling_with_dist_wrapper(int b, int n, int m, 
    at::Tensor points_tensor, at::Tensor temp_tensor, at::Tensor idx_tensor) {

    const float *points = points_tensor.data<float>();
    float *temp = temp_tensor.data<float>();
    int *idx = idx_tensor.data<int>();

    cudaStream_t stream = THCState_getCurrentStream(state);
    furthest_point_sampling_kernel_with_dist_launcher(b, n, m, points, temp, idx, stream);
    return 1;
}

My understanding is that the repo is pretty much unfinished so be careful.

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

get some progress here, changing line 244 in target_assigner.py to
pmask = torch.logical_and(pmask.unsqueeze(-1), dist_mask)
recovered losses.
After the model is trained, the evaluation results are not as expected:

Car AP@0.70, 0.70, 0.70:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:9.9624, 10.7949, 10.7949
3d   AP:3.4869, 9.0909, 9.0909
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:6.5877, 4.1919, 4.1919
3d   AP:1.1803, 0.8444, 0.8444
aos  AP:5.22, 4.16, 4.16
Car AP@0.70, 0.50, 0.50:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:18.2366, 14.7968, 14.7968
3d   AP:10.8402, 10.2320, 10.2320
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:13.5625, 9.3519, 9.3519
3d   AP:6.4555, 5.5059, 5.5059
aos  AP:5.22, 4.16, 4.16

Hi~Do you know the reason for this low AP value?

There are plenty of bugs existing in the repo, and I am trying to find out the reason. One bug I found so far is here where the last function should be

int furthest_point_sampling_with_dist_wrapper(int b, int n, int m, 
    at::Tensor points_tensor, at::Tensor temp_tensor, at::Tensor idx_tensor) {

    const float *points = points_tensor.data<float>();
    float *temp = temp_tensor.data<float>();
    int *idx = idx_tensor.data<int>();

    cudaStream_t stream = THCState_getCurrentStream(state);
    furthest_point_sampling_kernel_with_dist_launcher(b, n, m, points, temp, idx, stream);
    return 1;
}

My understanding is that the repo is pretty much unfinished so be careful.

OK, thanks , I'll check the original implementation of 3DSSD and try to find out what's wrong.

@zye1996 I have same issue.
In the process of calculating some losses, pmask is all zero.

get some progress here, changing line 244 in target_assigner.py to
pmask = torch.logical_and(pmask.unsqueeze(-1), dist_mask)
recovered losses.
After the model is trained, the evaluation results are not as expected:

Car AP@0.70, 0.70, 0.70:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:9.9624, 10.7949, 10.7949
3d   AP:3.4869, 9.0909, 9.0909
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:6.5877, 4.1919, 4.1919
3d   AP:1.1803, 0.8444, 0.8444
aos  AP:5.22, 4.16, 4.16
Car AP@0.70, 0.50, 0.50:
bbox AP:10.6542, 10.0903, 10.0903
bev  AP:18.2366, 14.7968, 14.7968
3d   AP:10.8402, 10.2320, 10.2320
aos  AP:10.65, 10.09, 10.09
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:5.2180, 4.1630, 4.1630
bev  AP:13.5625, 9.3519, 9.3519
3d   AP:6.4555, 5.5059, 5.5059
aos  AP:5.22, 4.16, 4.16

Hi~Do you know the reason for this low AP value?

There are plenty of bugs existing in the repo, and I am trying to find out the reason. One bug I found so far is here where the last function should be

int furthest_point_sampling_with_dist_wrapper(int b, int n, int m, 
    at::Tensor points_tensor, at::Tensor temp_tensor, at::Tensor idx_tensor) {

    const float *points = points_tensor.data<float>();
    float *temp = temp_tensor.data<float>();
    int *idx = idx_tensor.data<int>();

    cudaStream_t stream = THCState_getCurrentStream(state);
    furthest_point_sampling_kernel_with_dist_launcher(b, n, m, points, temp, idx, stream);
    return 1;
}

My understanding is that the repo is pretty much unfinished so be careful.

OK, thanks , I'll check the original implementation of 3DSSD and try to find out what's wrong.

many thanks if you can let me know what's going on with the rest. After I fixed the bug I mentioned above, the loss is unstable and fluctuating.

made some progress here and I will update everything here, since pull request may not make sense if the author is not responding

I reimplement 3DSSD with TensorFlow v2 here.

It might be easier to debug than Tensorflow v1.

I cannot thank @shuto-keio enough😂

I am getting this right now and still trying to fix some small problems:

Car AP@0.70, 0.70, 0.70:
bbox AP:88.3067, 87.4015, 87.4015
bev  AP:87.1555, 83.6703, 83.6703
3d   AP:81.8364, 75.6352, 75.6352
aos  AP:88.31, 87.40, 87.40
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:92.6265, 89.2507, 89.2507
bev  AP:89.0312, 84.9429, 84.9429
3d   AP:83.2703, 74.8559, 74.8559
aos  AP:92.63, 89.25, 89.25
Car AP@0.70, 0.50, 0.50:
bbox AP:88.3067, 87.4015, 87.4015
bev  AP:88.4671, 87.8447, 87.8447
3d   AP:88.4409, 87.7542, 87.7542
aos  AP:88.31, 87.40, 87.40
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:92.6265, 89.2507, 89.2507
bev  AP:92.9291, 91.6493, 91.6493
3d   AP:92.8714, 91.3892, 91.3892
aos  AP:92.63, 89.25, 89.25