local_frame_advantage bigger than i8::MAX
cscorley opened this issue · 5 comments
Describe the bug
Calculations for quality reports use i8 and can be too small during stress testing.
To Reproduce
Steps to reproduce the behavior:
- Using https://github.com/cscorley/bevy_ggrs_rapier_example
- Compile using release build
- Setup a game and introduce aggressive network lag such that 127+ "Skipping a frame: PredictionThreshold" will be produced on the clients
- Observe panic
thread 'main' panicked at 'local_frame_advantage bigger than i8::MAX: TryFromIntError(())', C:\Users\cscorley\scoop\persist\rustup\.cargo\git\checkouts\ggrs-151f8bd9eb75dc5d\9e4a20a\src\network\protocol.rs:512:18
Expected behavior
Non-panic, just keep skipping frames.
Screenshots
I was testing a game with clumsy on Windows, using this configuration.
Desktop (please complete the following information):
- OS: Windows 10
- Version: main branch, 9e4a20a
Additional context
Thanks for your report!
While this would be easy to fix by going from i8 to i32 or comparable, I am very curious how this issue can arise. Usually, when adding advancing the gamestate without remote input, PredictionThreshold
occurs to halt the user from advancing too much (error is thrown here).
This should mean that local_frame_advantage
should never be higher than the max prediction threshold, which by default is 8. In your example, you left that threshold at 8.
At this point, I can only hypothesize, but maybe I somewhere did a mistake when casting the numerical types from unsigned to signed and there is a silent overflow due to going negative? I will check it out later this week, probably on or shortly before the weekend.
PS: Sounds great! 😄
Another thought: Since the clumsy settings aren't too extreme (I tried worse both with real connections and in clumsy), could it maybe be the case that one of the clients cannot keep up with the framerate? This could especially be the case with rapier and big physics things...
You may be right about it being related to actual framerate and not network. I also have tested this by just laying down a sleep 3s
(e.g. dumb cap to approximate 180 frames) in one of the client systems and it seems to just "break through" the PredictionThreshold (after 111 of them) and allow one client to continue rendering, all before the other even completes it's first frame. (I do not have any framerate limiting in bevy itself, just the FPS limit set by bevy_ggrs)
In one test, I was able to get it to produce some 'fun errors' on the clients (but I don't know which was actually sleeping, forgot to log that info)
client 1
thread 'main' panicked at 'view entity should exist: QueryDoesNotMatch(3v0)', C:\Users\cscorley\scoop\persist\rustup\.cargo\registry\src\github.com-1ecc6299db9ec823\bevy_core_pipeline-0.7.0\src\main_pass_2d.rs:45:14
client 2
thread 'main' panicked at 'assertion failed: frame_to_load != NULL_FRAME && frame_to_load < self.current_frame &&\n frame_to_load >= self.current_frame - self.max_prediction as i32', C:\Users\cscorley\scoop\persist\rustup\.cargo\git\checkouts\ggrs-151f8bd9eb75dc5d\9e4a20a\src\sync_layer.rs:138:9
I have reproduced a couple of these while stepping through a ggrs test in debugger (so there are larger delays in wall clock time). This does not play nice with the code using wall clock time in ggrs, and can cause issues.
I most commonly hit
thread 'main' panicked at 'assertion failed: frame_to_load != NULL_FRAME && frame_to_load < self.current_frame &&\n frame_to_load >= self.current_frame - self.max_prediction as i32',
I believe this is due to a rollback being triggered for disconnect_frame 0 (disconnect frame chosen as first_incorrect), when current_frame is also 0.
When debugging I commented out the time check that triggers network interrupt events, and then after that I reproduced the assert in original post. Remote frame in the local frame advantage computation is based on an estimation with ping + framerate. It does not actually use the true remote frame, so when debugging or in very poor net conditions, it is possible for this to get very large and assert.
(Not sure if it's sound for this failure to occur considering I commented out disconnect code - but just sharing notes on that.)
You may be right about it being related to actual framerate and not network. I also have tested this by just laying down a
sleep 3s
(e.g. dumb cap to approximate 180 frames) in one of the client systems and it seems to just "break through" the PredictionThreshold (after 111 of them) and allow one client to continue rendering, all before the other even completes it's first frame. (I do not have any framerate limiting in bevy itself, just the FPS limit set by bevy_ggrs)
I would recommend confirming disconnect events are handled, I have a suspicion that may be the cause. Once client is disconnected GGRS will report this event, and continue to advance.