OpenAI Gym for a variation on the zero-sum game "spoof", otherwise known as "The 3-coin game"
Games with Imperfect Information: Theory and Algorithms (Laurent Doyen and Jean-Fran ̧cois Raskin)
Spoof, otherwise known as "The 3-coin game" is a multi-agent, imperfect-information, zero-sum game.
Three coins are arranged on a table, either head or tail up.
E.g: (Head, Head, Tail)
Player 0 only knows the number of heads which are up, Player 0 does not get to see the coins directly.
E.g: 2
Player 1 knows the full state of all the coins.
E.g. (Head, Head, Tail)
Player 0's objective is to get the coins into a configuration where they are all "heads-up" while Player 1's objective is getting all coins into a "tails-up" configuration.
Spoof is a two-player zero-sum game, therefore, a win for Player 0 is equivalent to a loss for Player 1
E.g. (Head, Head, Head)
Player 0 wins & Player 1 loses
E.g. (Tail, Tail, Tail)
Player 1 wins & Player 0 loses
- Player 1 chooses a configuration of the 3 coins, consisting of 2 heads and 1 tail.
E.g.(Head, Tail, Head)
- Player 0 chooses 1 coin in the set, without knowing whether it's heads or tails, and asks Player 1 to toggle that coin.
E.g. Player 0 chooses to toggle the first coin:
(Tail, Tail, Head)
- Player 1 can, but is not required, to exchange the positions of the two coins which were not flipped.
E.g. Player 1 chooses to swap the positions:
(Tail, Head, Tail)
- Player 1 tells Player 0 how many heads are up.
E.g.1
- Steps 2-4 are repeated until all coins are "heads-up", in which case Player 0 wins, or until all coins are "tails-up" and Player 1 wins.
Installs like any other custom Gym environment.
git clone https://github.com/MouseAndKeyboard/gym-spoof.git
cd gym-spoof
pip install -e .
import gym
environment = gym.make('gym_spoof:spoof-v0')
# Multi-agent environments return multiple observation states in a tuple.
(player_0_observation, player_1_observation) = environment.reset(shuffle_initial=True) #Initial configuration of coins is randomised
# Which player is going to make their move
active_player = 0
Multi-agent environments make use of the "Tuple" gym space type for actions and observations.
The custom "Categorical" action space type is a One-Hot vector, where the hot element refers to the coin Player 0 wants to flip
self.action_space = spaces.Tuple(
(
categorical_space.Categorical(self.coin_count),
spaces.Discrete(2)
)
)
done = False
player_0_action = None
player_1_action = None
while not done:
if active_player == 0:
# choose to flip the second coin
player_0_action = [0, 1, 0]
# choose to flip the third coin
#player_0_action = [0, 0, 1]
else:
# Swap the coins not flipped by Player 0
player_1_action = True
# Don't swap the coins not flipped by Player 0
# player_1_action = False
world_state, rewards, done, info = environment.step((player_0_action, player_1_action))
player_0_observation, player_1_observation = world_state
player_0_reward, player_1_reward = rewards
# who's turn is it to act? Player 0 or Player 1
active_player = info['next_turn']
./examples/
provides some sample implementations of the gym.