asik/FixedMath.Net

Debugging Determinism? (discussion)

Closed this issue · 6 comments

This isn't an issue with FixedMath.Net, but I'm running out of places to ask about this.

My game has been built to use deterministic lockstep for networked multiplayer. Relatively recently I finally got online games to last more than five minutes before desyncing. The game is now at a point where things are desyncing inconsistently, in a way that is difficult to reproduce reliably. Until now, I have been using checksums, dumping readable gameplay data to file, and other tedious methods of debugging. (Sometimes resulting in 3+mb text files)

My question is... are there tools or methods to more easily debug this sort of thing?

Someone suggested I create a bot to automatically do the same action over and over until a desync occurs. This would help rule out certain weapons/actions, but it still has issues and is tedious. I thought about maybe some kind of memory dump? I really have no idea how I would do that though, and even then it would pose its own problems...

This game has been quite a trek for me. Any insights much appreciated. :)

asik commented

Well, asking here you're probably only going to get my opinion, as someone who's actually never worked on a deterministic lockstep game :p

I guess full state dumps + diff tools + script could help quickly identify where the states diverged and what exactly diverged.

I would also take an analytical look at the code and identify all possible sources of indeterminism. I believe Unity games are composed of scripts attached to different game objects, correct? If so, is it guaranteed to invoke these in the same order? Can multiple objects get updated at the same time?

Shared state is the biggie. What state is shared and updated by several different objects or processes? What guarantees that updates to shared state are done in the same sequence on every machine?

Any use of random numbers? What guarantees the rng gets used in the same order on each machine the game runs on?

I wish your problem doesn't actually have to do with floating point rounding issues, cause that would be harder to fix.

I've basically built my own mini-engine on top of Unity. Order of execution is carefully controlled manually. I'm using a deterministic seed for my random numbers. I'm pretty sure I've isolated all floating point values.

How could I go about "state dumps" and "diff tools" in this context?

asik commented

I mean writing the whole state of the game into a file on each frame, and then use some kind of text or binary comparison tool (or write it yourself) to compare states on each machine for each frame so you can see exactly on which frame the desync happened and what part of the state exactly was different.

I can already detect what frame the game desyncs on. I write a large number of game vars to a csv file; So many that the game FPS is severely impacted. The only way I know how to do this is to manually write code that writes script vars to file. This can get pretty tedious, as I have stated before.

My game has logic "frames" that are separate from render frames, so I can easily do a Ctrl+F for the frame number in which a desync occurs.

Kind of off-topic for this repo but stumbled upon this so here's my 2 cents anyway. Bit late but maybe it helps for anyone passing by.

We are having a similar problem with a seperate "game loop backend" built in .NET attached to the Unity Engine. The game loop runs on a server and in the client, replaying client commands to confirm validity of their actions. What we found is that despite running the gameloop using the same DLL everywhere, there is still drift in the floating point values.

Our realization was that Unity runs on the Mono runtime (its own fork of Mono even). The server meanwhile was running the same game logic on the .NET Core runtime. What this means is that System.Math can differ in the implementation details. Contrary to theoretical math, in floating point operations the order of multiplication matters a lot. So if the Mono implementation of System.Math has the slightest difference compared to .NET Core, you're screwed.

Oh and additionally, some of our runs were on ARMv7, which brings its whole own slew of problems regarding floating point determinism.

We had to figure out all this stuff during production with a bunch of juniors so take it with a grain of salt and do your own research. It might at least point you in the right direction. On the surface it sounds like @Bamboy would be best off replacing all floating points with fixed math just like we are planning now. Just be careful choosing between 32 and 64 bit if you are going to target mobile devices in any capacity since ARMv7 32 bit architecture appears to still be widespread.

@Gibibit I still have mystery desyncs in my project, but I do not have an "authoritative" server, rather it just serves to communicate between clients along with checksums.
Most of my deterministic math uses FixedMath.Net, but in a few places I use System.Math/Mathf when dealing with integers. Games hosted on the same network can be stable for a few minutes.
Any additional insight would still be very helpful. :)