NordicSemiconductor/Android-nRF-Mesh-Library

Remaining problems with LowerTransportLayer.sendBlockAck

daretobeorjan opened this issue · 2 comments

It seems the previous attempts to resolve this issue didn't completely help.

We've now seen multiple crashes when the mUpperTransportLayerCallbacks.getNode call returns null, causing an unhandled NPE when calling incrementSequenceNumber.

Unfortunately, I still can't really produce a small sample that reproduces the problem, I think our meshes are usually rather crowded with lots of messages flying back and forth that triggers it.

However, would it be possible to do some kind of workaround that would at least prevent the crash? Since it is run in a separate thread, there is no way for us to catch the exception, so our app just crashes. The easiest would just to be a try/catch and not send the ack, but I'm not really sure what kind of repercussions that would have on the functionality if the mesh.

@daretobeorjan Currently I am on paternity leave but i'll try to help when I have some time.

I remember this edge case being reported some time ago. How is your network setup? Do you have more than one provisioner? if so are all provisioners aware of all the nodes in the network?

Edit:

However, would it be possible to do some kind of workaround that would at least prevent the crash? Since it is run in a separate thread, there is no way for us to catch the exception, so our app just crashes. The easiest would just to be a try/catch and not send the ack, but I'm not really sure what kind of repercussions that would have on the functionality if the mesh.

Not sending an ack in time would repeat the original message a number of times depending on the mesh application layer implementation. This would create unnecessary traffic.

I remember this edge case being reported some time ago. How is your network setup? Do you have more than one provisioner? if so are all provisioners aware of all the nodes in the network?

We do have multiple provisioners, most of the time. The most common trigger for us is having two phones online at the same time, when provisioning a new device on one phone, the other almost always crashes. But I have also reproduced this with only one phone and provisioner, so it is not strictly limited to that situation.

Edit:

However, would it be possible to do some kind of workaround that would at least prevent the crash? Since it is run in a separate thread, there is no way for us to catch the exception, so our app just crashes. The easiest would just to be a try/catch and not send the ack, but I'm not really sure what kind of repercussions that would have on the functionality if the mesh.

Not sending an ack in time would repeat the original message a number of times depending on the mesh application layer implementation. This would create unnecessary traffic.

Well, yes, but the app crashing completely with no way to catch the exception isn't much better. :)