A few questions about the Splendor AI
Closed this issue · 1 comments
jsomers commented
Hi, this looks like a fantastic implementation of AlphaZero for Splendor—thanks for making it. I had a few questions:
- Splendor has a few mechanics that chess, Go, and Shogi don't seem to have. How do you handle them? In particular I'm thinking of:
a. Hidden information: you can take a face-down card from the pile and it remains hidden until you play it.
b. Chance: the cards are shuffled.
c. Multiplayer: you can have more than two players.
Is there any new theory involved or does the same old MCTS + NNS work just fine? If the latter, is there anything special you have to do to handle these different gameplay elements?
- How good is the best bot you've trained? And how do you know how good it is?
Thank you!
cestpasphoto commented
Sorry @jsomers for long reply, somehow I wasn't notified of your question.
- Hidden information and chance: I keep a list of available cards, and when needed I just randomly draw a card at that stage (in both simulated games and in final games, so that the simulation never knows which card will be drawn at the end). Please also check what I called "repeatable randomness" in this comment, that brings also some improvement but minor.
- Multiplayer: that can be generalized quite well. In initial version, the "value" is a scalar which states a "kind of probability" that current player will win (not between 0 and 1 but between -1 and 1); when switching to a player to another, you just return oppostive value. In my version with multiplayers, I simply code that information on an array (array[0] is same value for current player, array[1] is same next player, ...); when switching player, I just have to roll the array.
- The best bot is really really good :-D See my other repo where you have the in-browser version to test by yourself, and see also some results vs other engines. The network is being trained with 400 mcts iterations: I manage to beat the engine sometimes when running in Easy (25 iterations), once in Medium (100 iterations) and never in Native (400 iterations).