AssertionError: wx and wy should be equal
eyildiz-ugoe opened this issue · 18 comments
During the training session it throws an error as such:
Epoch[0] Batch [140] Speed: 5.32 samples/sec Train-Flow_L2Loss=0.000000, Flow_CurLoss=0.000000, PointMatchingLoss=15.519974, MaskLoss=1.911539,
Epoch[0] Batch [160] Speed: 5.50 samples/sec Train-Flow_L2Loss=0.000000, Flow_CurLoss=0.000000, PointMatchingLoss=15.912843, MaskLoss=1.730901,
Epoch[0] Batch [180] Speed: 5.55 samples/sec Train-Flow_L2Loss=0.000000, Flow_CurLoss=0.000000, PointMatchingLoss=16.262645, MaskLoss=1.588497,
batch 200: lr: 0.0001
Epoch[0] Batch [200] Speed: 5.54 samples/sec Train-Flow_L2Loss=0.000000, Flow_CurLoss=0.000000, PointMatchingLoss=16.708019, MaskLoss=1.470834,
Epoch[0] Batch [220] Speed: 5.55 samples/sec Train-Flow_L2Loss=0.000000, Flow_CurLoss=0.000000, PointMatchingLoss=19.187362, MaskLoss=1.543989,
Error in CustomOp.forward: Traceback (most recent call last):
File "/home/username/.local/lib/python2.7/site-packages/mxnet/operator.py", line 987, in forward_entry
aux=tensors[4])
File "experiments/deepim/../../deepim/operator_py/zoom_flow.py", line 60, in forward
assert wx == wy, 'wx and wy should be equal'
AssertionError: wx and wy should be equal
terminate called after throwing an instance of 'dmlc::Error'
what(): [18:43:36] src/operator/custom/custom.cc:347: Check failed: reinterpret_cast<CustomOpFBFunc>( params.info->callbacks[kCustomOpForward])( ptrs.size(), const_cast<void**>(ptrs.data()), const_cast<int*>(tags.data()), reinterpret_cast<const int*>(req.data()), static_cast<int>(ctx.is_train), params.info->contexts[kCustomOpForward])
Stack trace returned 8 entries:
[bt] (0) /home/username/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x37b172) [0x7f1d99d47172]
[bt] (1) /home/username/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x37b738) [0x7f1d99d47738]
[bt] (2) /home/username/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x56dcd1) [0x7f1d99f39cd1]
[bt] (3) /home/username/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x5885c1) [0x7f1d99f545c1]
[bt] (4) /home/username/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x56ebc6) [0x7f1d99f3abc6]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd8f0) [0x7f1e225cc8f0]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f1e26b606db]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f1e26e9988f]
Aborted (core dumped)
Any idea what's going on?
I'm facing the same problem when training the model. Were you able to fix the problem @eyildiz-ugoe @liyi14 @wangg12 ?
I think it is because the learning rate you set is too high or you removed the flow loss, which makes the rendered object so wrong that it is out of current image. To validate this, you can add an assert here
Thanks a lot for the reply.
The weird thing is that I did not change anything in the reference implementation yet.
For some reason the flow loss does not decrease during training. Should I lower the pre-set learning rate?
@Cryptiex Have you checked your loaded data, e.g. via visualization?
During the training session it throws an error as such:
Epoch[0] Batch [140] Speed: 5.32 samples/sec Train-Flow_L2Loss=0.000000, Flow_CurLoss=0.000000, PointMatchingLoss=15.519974, MaskLoss=1.911539, Epoch[0] Batch [160] Speed: 5.50 samples/sec Train-Flow_L2Loss=0.000000, Flow_CurLoss=0.000000, PointMatchingLoss=15.912843, MaskLoss=1.730901, Epoch[0] Batch [180] Speed: 5.55 samples/sec Train-Flow_L2Loss=0.000000, Flow_CurLoss=0.000000, PointMatchingLoss=16.262645, MaskLoss=1.588497, batch 200: lr: 0.0001 Epoch[0] Batch [200] Speed: 5.54 samples/sec Train-Flow_L2Loss=0.000000, Flow_CurLoss=0.000000, PointMatchingLoss=16.708019, MaskLoss=1.470834, Epoch[0] Batch [220] Speed: 5.55 samples/sec Train-Flow_L2Loss=0.000000, Flow_CurLoss=0.000000, PointMatchingLoss=19.187362, MaskLoss=1.543989, Error in CustomOp.forward: Traceback (most recent call last): File "/home/username/.local/lib/python2.7/site-packages/mxnet/operator.py", line 987, in forward_entry aux=tensors[4]) File "experiments/deepim/../../deepim/operator_py/zoom_flow.py", line 60, in forward assert wx == wy, 'wx and wy should be equal' AssertionError: wx and wy should be equal terminate called after throwing an instance of 'dmlc::Error' what(): [18:43:36] src/operator/custom/custom.cc:347: Check failed: reinterpret_cast<CustomOpFBFunc>( params.info->callbacks[kCustomOpForward])( ptrs.size(), const_cast<void**>(ptrs.data()), const_cast<int*>(tags.data()), reinterpret_cast<const int*>(req.data()), static_cast<int>(ctx.is_train), params.info->contexts[kCustomOpForward]) Stack trace returned 8 entries: [bt] (0) /home/username/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x37b172) [0x7f1d99d47172] [bt] (1) /home/username/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x37b738) [0x7f1d99d47738] [bt] (2) /home/username/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x56dcd1) [0x7f1d99f39cd1] [bt] (3) /home/username/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x5885c1) [0x7f1d99f545c1] [bt] (4) /home/username/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x56ebc6) [0x7f1d99f3abc6] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd8f0) [0x7f1e225cc8f0] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f1e26b606db] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f1e26e9988f] Aborted (core dumped)
Any idea what's going on?
Did you solverd the problem yet?
I'm facing the same problem when training the model. Were you able to fix the problem @eyildiz-ugoe @liyi14 @wangg12 ?
Did you solverd the problem yet?
I think it is because the learning rate you set is too high or you removed the flow loss, which makes the rendered object so wrong that it is out of current image. To validate this, you can add an assert here
You mean, if the predicted pose is worse, the rendered object out of the image?
But,
- how should we constrain the rendering result under an arbitrary pose, to prevent it out from the image coordinate range?
- If it is inevitable that the rendering objects out from the image coordinate range, how should we deal with it?
You can try to compute the 2D object bounding box without clipping it within the image size.
Thanks a lot for the reply.
The weird thing is that I did not change anything in the reference implementation yet.For some reason the flow loss does not decrease during training. Should I lower the pre-set learning rate?
I tried to decrease the learning rate from 1e-4 to 5e-5, but it does not work.
You can try to compute the 2D object bounding box without clipping it within the image size.
Thanks for replying.
Could you please describe more details about how to solve it? Forgive me that I am not familiar with the implementation details of DeepIM currently, but I am really interested in it.
I guess you can check the loaded data through visualization.
To get the flow work, you need also check whether the gpu_flow calculator is working as expected (test flow), if not, you can simply disable it.
Yes. Then you might need to debug into the data loader and zooming operations.
Hi, AssertionError: wx and wy should be equal might be resulted in the wrong pose rendering. I mean it might be the rendering result out from the image coordinates range.
My question is that if the rendering out from the image is inevitable during training, how should I do?
You can try to compute the 2D object bounding box without clipping it within the image size.
As I said, you can compute the 2D object bbox without clipping it within the image size.
In this implementation, the box is obtained from mask, so it always clips the box and may result in an empty box.
However, you can directly get the box through projection.
You can try to compute the 2D object bounding box without clipping it within the image size.
As I said, you can compute the 2D object bbox without clipping it within the image size.
In this implementation, the box is obtained from mask, so it always clips the box and may result in an empty box.
However, you can directly get the box through projection.
Hi.
It seems that we find the solution.
As you said before, the rendered object so wrong that it is out of the current image. To avoid this, we change the train iteration number from 4 to 2, to make the training more easy and stable, and obtain an initial pre-trained model parameter. Based on this pre-trained model, we fine-tune the model and change the iteration num back to 4.
Glad you have solved it. BTW, a warmup strategy for iterations could also do the trick.