mas-bandwidth/yojimbo

Intermittent Issue Parsing Packets After Reconnect, Suggestions for Troubleshooting?

leoleblanc opened this issue · 10 comments

Hello, we are having an issue with Client-Client communication and I was hoping you might be able to point me in the right direction for debugging the problem.

With our implementation of Yojimbo, our server just serves to forward messages between clients in a game. Client A sends a message to the Server which then forwards that message to Client B. Always just two clients, and always the ReliableOrderedChannel.

Our issue is that occasionally Client B fails to parse packets coming from Client A/Server. It fails because the SerializeCheck function inside of ReadPacket fails, and that is because ReadBits was returning a non-zero value, presumably indicating it was unable to properly read all of the bits of the packet.

I initially suspected this was because when Client B would disconnect and reconnect, it would receive a part of the message, would disconnect and discard that bit, and upon reconnecting, get the rest of the message without the missing piece. This, however, would only be a problem when Client B is attempting to reconstruct the full message, but it appears to fail before this process begins. This case only appears to occur when a message is en route from Client A to the Server when Client B is disconnected, and no other cases, from what I've observed.

I can't determine what exactly is going wrong here though, other than the packet is failing to be read. Do you have any suggestions for how to go about debugging this?

This is almost certainly caused by a serialization desync in one of your message types. Try to add serialize_checks around each message type you serialize, and you should be able to find the message type that causes the desync, since it will now trigger a serialize check in it's serialize function instead of elsewhere in the packet.

Once you've identified the message type that has a bad serialize fn, if you can't spot the read/write desync in that message type by eyeballing it, you can use serialize_checks to binary search the serialize function, eg. by adding more in the middle of that serialize function until you isolate the line that is causing the desync.

Will do, thank you! And after the problem function/line has been identified, the next step would then be to fix this serialization function so that the read/writes will then pass (ideally solving the aforementioned issues)? Sorry if this is a question with an obvious answer, but I am very new to this framework and intrigued about working with it!

Yes I believe so. I think you will have a message serialize that has a slight desync between read and write, under a certain circumstance, and once you identify which message it is, you should be able to fix that serialize and this problem will go away.

cheers

  • Glenn

I wonder if this desync you are talking about causes the same issue on the server when the client reconnects.

0 :[server endpoint] sending packet 0
[server endpoint] sending packet 0 as 3 fragments
assert failed: ( packet_bytes <= NETCODE_MAX_PACKET_SIZE ), function netcode_server_send_packet, file C:\development\c...\yojimbo\netcode.io\netcode.c, line 4617

Assert failure occurs every time I disconnect client 0 before client 1 and then attempt to reconnect client 0.

It could also be that you have a configuration issue.... what config do you have that is different from default (eg. packet sizes, fragment sizes and all that...)

I followed the USAGE, configuration is all default except for numChannels, which I've set to 2, with channel[0] reliable and channel[1] as unreliable. I am going to make a sample where it's reproducible... maybe I'll run into a solution...

Thanks!

I've figured my issue out, it's a gaping mistake on my part looping over connected clients on server side.

int connectedClients= server->GetNumConnectedClients();
for (int i = 0; i < connectedClients; i++){...

and then using i for receiving, it becomes clear that when client 1 is disconnected of 2 connected, ReceiveMessage call will be on the wrong clientIndex. And things get even worse when 1st client attempts to reconnect.

I now loop over maximum clients server handles and operate only on the clientIndex that IsClientConnected

Everything works awesomely now,
Thank you Glenn

AWESOME!