game-state mismatch [bug]/[possible enhancement]
Closed this issue · 7 comments
If a user refreshes the betago web page but not the server, the game does not actually reset. As a result, an error is thrown by the server any time you try to play in an area that was previously played on. The javascript does not have proper error handling for this, so it stops playing as black, and starts playing and processing as white (locally) for the next move but sends the coordinates to the server as if it were a black move.
The js error always looks like this:
And, the server side callback always looks like this:
Traceback (most recent call last):
File "/home/ubuntu/workspace/.betago/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/home/ubuntu/workspace/.betago/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/home/ubuntu/workspace/.betago/lib/python2.7/site-packages/flask_cors/extension.py", line 188, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/home/ubuntu/workspace/.betago/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/ubuntu/workspace/.betago/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/home/ubuntu/workspace/.betago/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/ubuntu/workspace/.betago/lib/python2.7/site-packages/flask_cors/extension.py", line 188, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/home/ubuntu/workspace/.betago/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/ubuntu/workspace/.betago/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/home/ubuntu/workspace/.betago/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/ubuntu/workspace/betago/betago/model.py", line 79, in next_move
self.bot.apply_move('b', (row, col))
File "/home/ubuntu/workspace/betago/betago/model.py", line 133, in apply_move
self.go_board.apply_move(color, move)
File "/home/ubuntu/workspace/betago/betago/dataloader/goboard.py", line 157, in apply_move
raise ValueError('Move ' + str(pos) + 'is already on board.')
ValueError: Move (3, 9)is already on board.
The problem is resolved if the server is restarted but if the web page is not reloaded, too, it can lead to more confusion. The web page doesn't know to reset, so the old (and wrong) game data still shows and moves from the server can coincide with illegal moves.
A possible solution to this is to create a randomly generated session number or hash that is sent to the server with each move, as well as what move number the client thinks it is at.
If the server gets a message with a new session number, it can:
- Start a new instance of betago and keep the old one running or
- Close anything that is currently running and start a new game or
- Restart the server
If the client is getting mismatched data, it can display an alert, clear the board, or abort.
Another solution (and possible feature enhancement) could be for the server to keep track of the board and send a copy of the current game state as an array upon request of the client (i.e. when the web page is loaded or an error is suspected) or even as an additional piece of data embedded in the response (significantly enhances reliability but hurts speed and bandwidth). This would be very easy to implement client side.
Ideal for single instance server:
When UI client loads, it pulls latest game data from the server and continues from there as if it was open the whole time. The user can start a new game at any time by pressing a button.
This is a very similar to issue to #26 and the manifestation of the bugs are practically identical.
Hi @rocketinventor, thanks for digging deep on this! I like the idea of using a hash to track the current game state.
I don't think the server should proactively reset if it gets a mismatched token; better to let the client make the decision to either a) start a new game or b) request a sync of the full game state.
So I guess the steps to complete are:
- Implement a game-state token and verify it when accepting a move from the client
- Implement new game call
- Implement sync game state call
- Implement error handling on client
- Add new game button to client
After reducing the frequency of this issue, I was able to confirm that #27 is indeed the root cause of #26 and not bad server logic.
With this fix, I can confirm that BetaGo will recognize areas that it captured as playable. However, until #27 is fixed, there is no way for me to know if this issue is causing #26 or if #26 is its own problem or not (I'm still getting errors playing in areas that I captured).
@rocketinventor Indeed betago only is designed to serve a single game per launch of the server. That's consistent with the GTP protocol, but it raises issues like this one, and unnecessarily prevents you from running one web server to play simultaneous games across the internet.
A simpler approach than implementing state tokens would be to have the server maintain no state at all, and to have the client to send the complete move history with each request. Then the bot would just do a loop of apply_move() starting from an empty board before selecting its move.
Well this raises two separate questions:
- Do we want to make a production-quality web server for the bot? This was never a goal I had, but if people are interested in it it's certainly open for discussion.
- Should bots be stateless? I don't think this is practical. For the current bot it's fine, but a bot that does any kind of tree search will want to preserve information in between moves. I think we should design for stateful bots to keep this option open.
@cdmalon As @macfergus said, a stateless server would be interesting but the implementation would be impractical and waste a lot of data and CPU cycles for what we need. Plus, parsing the game data for the server and implementing GTK over HTTP would add additional complexities. At this point, I think it is fine just to mirror one game across multiple clients.
A similar system to what I proposed is being used right now by OGS, which is a popular game server uses a similar system to what I described and runs quite well. Session/device tokens are as simple as adding a few cookies to the flask server. The web browser handles the rest (automatically).
The biggest problem right now is that the client sometimes accepts an invalid move or applies moves as the wrong color (or doesn't apply them at all). I have made some improvements to the client code that address this issue via improvements to #30. By making a few simple changes, I was able to reduce the rate of error so much that I had to artificially create new ones in order to test it. I will have the pull request out as soon as I get a chance.
We don't need a production quality web server but, it needs to be reliable so that the web server/client aren't causing problems that inhibit the regular use of BetaGo.
I just submitted a pull request (#38) that addresses this issue, please merge.
I would like to implement game sync but I don't know where the data is stored (see #45). Please help!
Thanks!
@maxpumperla @macfergus