cestpasphoto/alpha-zero-general

A few questions about the Splendor AI

Closed this issue · 1 comments

Hi, this looks like a fantastic implementation of AlphaZero for Splendor—thanks for making it. I had a few questions:

  1. Splendor has a few mechanics that chess, Go, and Shogi don't seem to have. How do you handle them? In particular I'm thinking of:
    a. Hidden information: you can take a face-down card from the pile and it remains hidden until you play it.
    b. Chance: the cards are shuffled.
    c. Multiplayer: you can have more than two players.

Is there any new theory involved or does the same old MCTS + NNS work just fine? If the latter, is there anything special you have to do to handle these different gameplay elements?

  1. How good is the best bot you've trained? And how do you know how good it is?

Thank you!

Sorry @jsomers for long reply, somehow I wasn't notified of your question.

  • Hidden information and chance: I keep a list of available cards, and when needed I just randomly draw a card at that stage (in both simulated games and in final games, so that the simulation never knows which card will be drawn at the end). Please also check what I called "repeatable randomness" in this comment, that brings also some improvement but minor.
  • Multiplayer: that can be generalized quite well. In initial version, the "value" is a scalar which states a "kind of probability" that current player will win (not between 0 and 1 but between -1 and 1); when switching to a player to another, you just return oppostive value. In my version with multiplayers, I simply code that information on an array (array[0] is same value for current player, array[1] is same next player, ...); when switching player, I just have to roll the array.
  • The best bot is really really good :-D See my other repo where you have the in-browser version to test by yourself, and see also some results vs other engines. The network is being trained with 400 mcts iterations: I manage to beat the engine sometimes when running in Easy (25 iterations), once in Medium (100 iterations) and never in Native (400 iterations).