mas-bandwidth/yojimbo

Assert in ReliableOrderedChannel::AddFragmentPacketEntry after some time

BillEnvrmnt opened this issue · 10 comments

Hi. I am running into an assert within ReliableOrderedChannel::AddFragmentPacketEntry due to the sequence value being passed to it being too old for it to be inserted into m_sentPackets. But after trying to see what could be wrong I'm not sure what I'm missing here. This is my understanding:

  • ReliableOrderedChannel
    -- Owns m_sentPackets which is a SequenceBuffer tracking the sent packets
    -- m_sentPacket relies on an internal m_sequence value to know what values are valid to be used for insertion
  • Client::SendPackets
    -- This function uses reliable_endpoint_next_packet_sequence to get the next packet sequence value from m_endpoint (reliable_endpoint_t).
    -- However there seems to be no direct relation between the sequence returned from that function and the one stored within m_sequence in m_sentPackets

As such my situation leads to a crash. The config has just one Unreliable Unordered channel and one Reliable Ordered channel. The connection in question sends data on the unreliable channel very often but only sends data on the reliable channel once every few minutes or so. The first messages on the reliable channel are fine but then a few minutes go by and it tries to send a message again on the reliable channel and runs into this assertion.

Now the application in question is a little odd in that it has a yojimbo connection to a server but is also simultaneously running a yojimbo server itself for various reasons. I don't see anything looking like that would cause an issue but wanted to mention it.

What am I missing here?

Are you using the latest head revision or the latest release?

Currently the latest head version of yojimbo, netcode, and reliable as of just today. Previously I was using the 1.1 release. But after running into this issue tried upgrading to head to see if I was missing a fix but am seeing the same error.

Can you please create a small test program that reproduces this issue, ideally self contained in a single executable? Should be an easy fix then.

Sure. I've modified test.cpp to add a new test function which can demonstrate the issue. The modified version is attached to this message in a zip file (couldn't attach the file itself).
test.zip (EDIT: New version, fixing a typo)

The new test is 'test_reliable_sequence_value_outdated'. It creates a client and a server with two channels. Then it sends some unreliable and reliable data then sends just unreliable data for quite some time before finally trying to send reliable data again. Its just using block messages for this test.

Let me know if that is the kind of test program you are looking for and what further information I can provide. Thanks!

For reference when the crash occurs in my actual application m_sequence in SequenceBuffer is around 5000 and the new sequence value is around 55000. Upon a closer look it seems like sequence_greater_than would need to handle that case but can't due to the wrapping check. The test program above will end up with m_sequence being 38 and the new sequence value being 35021

sfla commented

Did you ever figure this out? We're seeing similar results with ReliableOrderedChannel::AddMessagePacketEntry.

sfla commented

We might've figured it out, but have no idea what the original thought here was. Following the different examples in client.cpp and USAGE.md, we just called client.SendPackets() on every single client-tick, I.E even when we don't have anything to send. By debugging whatever happened here, we could see that SendPackets() calls
uint16_t packetSequence = reliable_endpoint_next_packet_sequence( GetEndpoint() );
This is a number that increments every time it's called, regardless of whether it actually has anything to send.
So let's say you send a packet every tick the first 20 ticks. Then at tick T+21 you stop sending, but SendPackets() is still called, giving you a still-increasing packetSequence-number.
Then at T+32778, your packetSequence-number will be 10, because the max size of the packetSequence is 32768, and it wraps around. But then there's logic in yojimbo.h:1409 checking if the requested packetSequence(10) is larger or less than m_sequence, which is a far smaller number, seeing as nothing has been sent for a long time(?).
I can only assume that since it has wrapped around, the idea here is that Insert() should return NULL because "we have too many sequenceNumbers before your sequenceNumber that must be sent before you, so we discard you" or whatever. But these will never be sent, because they aren't real messages.

Anyway, by just putting a simple check to see whether we actually have anything to send before calling SendPackets, we never hit an assert again.

if(client->HasMessagesToSend(NetworkChannels::RELIABLE){
    client->SendPackets();
}

Please let me know if this is not how it's intended to be used, and if there were other reasons as to why this happened to us.

Edit:
Actually, at this point we're only using ordered reliable messages. We will implement unreliable/unordered messages later, so this "solution" is not really a solution. If we do a double check on HasMessagesToSend for both Reliable and Unreliable, if we then need to send enough unreliable between every reliable message, the sequenceNumber would still be increasing relative to the reliable-"queue", so I suspect we might hit this issue again later, if this "ordered"-logic only tracks reliable sequence-numbers, while the actual sequenceNumber will increment across both reliable and unreliable packets..

I came to the same problem as the user above me, but even after their suggested client fix, I began getting the asserts on the server side of the code. Since the server function for sending packets is a generic one that sends to all clients and all needed variables and functions are private, I cannot do a check for each client and send only to the ones that have packets, so this probably needs to be done in yojimbo on both Client and Server function SendPackets.

As noted by the user above I don't know what will happen if I begin to use unreliable channels and what happens if I use separate reliable channels for different things (might have to merge them all into 1 reliable channel and not have unreliable messages).

Other than that and some learning curve to setup the library, everything works okay, but if this isn't fixed (or if we all are using the library wrong, have an example of how it is meant to be used) I don't know if I won't be forced to use another library for my project, since reaching this assert in normal conditions takes less than 1 hour and renders it unusable.

sfla commented

@Alejandro131 it's been a while, but check if this discussion helps you out:
#146 (comment)

Thank you for the fast response! It didn't occur to me to look through the closed issues since this one is open and seemed to match mine.

After looking through the discussion and examining my code, I "fixed" it so that it wouldn't cause the assert. In my situation the client and server communicated through 3 channels - 1 unreliable and 2 reliable (1 for game related actions and 1 for chat messages). Later on the 1st channel became reliable as well and that was the one through which I sent ping messages from the client to the server, among other things.

Since the server doesn't respond to the ping request but rather uses it to know when to disconnect the client or use it as an AFK detection method, after the client fix, the server was throwing the assert. I added a ping response from the server to the client so that the sequence is ok, but since it was on another channel from the one related to game actions the 2nd channel throws the asserts. So in the end I decided to merge all 3 channels into 1 and since I have a ping packet from the client and from the server, the sequences can't mismatch that much to throw the assert.

Given that you and I (and possibly other people) have come by this scenario, it might be a good idea to add some information about this in the readme/examples, so that people will know what to do from the beginning and not go down the debugging rabbit hole later in production. Something along the lines that for each channel you need to send a reliable message every now and then (or maybe even a specific number like every second, every 10000 messages or something like that).