Counterfactual Regret Minimization (usually abbreviated as "CFR") is a technique for solving imperfect-information games, which are games in which some information is hidden from players. For example, in most card games, players are dealt cards that they keep hidden from the other players. Solving such a game means finding a Nash equilibrium, which is a strategy that cannot be improved further by incremental changes.
Cfrm is a .NET library for applying CFR to a game of two or more players. To solve such a game, you create a concrete class that inherits from an abstract base class called GameState<TAction>
that defines the state of the game from the current player's point of view (this state is known as an "information set" in game theory). The type parameter TAction
defines the type of actions available in the game (e.g. playing a card, making a bet, etc.). This can be an enum
or any other type (such as a Card
class).
If you are working in C#, the methods of GameState
that you must override are:
int CurrentPlayerIdx
This is the 0-based index of the current player. For example, if there are four players in a game, their indexes are0
,1
,2
, and3
. Players do not necessarily play in that order, though. (E.g. In trick-taking games, the player who takes a trick typically leads on the next trick.)float[] TerminalValues
If the game is in a terminal state (i.e. the game is over), this member answers an array of "payoff" values for each player. For example, in a two-player game, if player 0 wins, she might receive 1 point, while player 1 would receive -1 point for losing, resulting in a payoff array of[ 1, -1]
for that outcome. If the sum of the payoffs in a terminal state is always 0, then the game is "zero-sum". If the game is not over, then this member answersnull
.TAction[] LegalActions
If the game is not in a terminal state, this member answers an array of legal actions for the player whose turn it is in the current state. For example, in a poker game, these actions might beBet
orFold
, while in Bridge, the legal actions would be specific bids or cards.GameState<TAction> AddAction(TAction action)
This method advances the game to the next state by taking the given action on behalf of the current player.string Key
This member answers a string that uniquely describes the state of the game from the point of view of the current player. For example, in a card game, the unique key might contain the history of the game to this point (i.e. all the cards played so far, by all players) plus the hidden cards remaining in the current player's hand.
You can think of GameState
as representing all the possible nodes in a game's "move tree". Key
is a node's unique ID, LegalActions
are the branches leading to the node's child nodes, and AddAction
moves from a node to one of its children.
Once your concrete GameState
class is ready, you can run CFR by invoking the static member CounterFactualRegret.Minimize
, which takes the following arguments:
numIterations
: The number of CFR iterations to run. Each iteration corresponds to a single play-through of the game.numPlayers
: The number of players in the game.getInitialState
: This is a callback function that initializes a new game. This initialization can be random (corresponding to shuffling the deck in a card game), or it can move sequentially through all possible initial states in order.
The Minimize
function returns a tuple containing two values:
float[] expectedGameValue
: An array containing the expected payoffs for each player at the Nash equilibrium. If the game is zero-sum, these payoffs will sum to zero.StrategyProfile strategyProfile
: A collection of strategies for the game states visited while running CFR. To access these strategies from C#, use theIDictionary<string, float[]> ToDict
member. The keys of this dictionary correspond toGameState.Key
, while the values are arrays representing the probability of taking each of theGameState.LegalActions
at that game state. This profile can be used to play the game according to the strategy found by CFR.
After minimizing regret, the resulting strategy profile can be saved to disk via its Save
method. In order to play a game with a saved profile, first load it from disk using the static StrategyProfile.Load
method. To play a mixed strategy (e.g. in poker), call the Sample
method with a key that corresponds to the state of the game from the current player's point of view. To play a pure strategy, call the Best
method, which always chooses the action with the highest probability in a given situation.
Cfrm is written in F# and supports F# implementations smoothly. The important differences from working in C# are:
- When inheriting from
GameState
, you can overrideTerminalValuesOpt
instead ofTerminalValues
in order to avoid returningnull
. - You can call function
CounterFactualRegret.minimize
instead of methodCounterFactualRegret.Minimize
. This allows you to pass in an F# closure for thegetInitialState
callback instead of the wrappedFunc
that is passed toMinimize
.
There are working examples of Kuhn poker for both C# and F# in the unit test projects.
- Vanilla Counterfactual Regret Minimization for Engineers: Walkthrough of a Python implementation of 2-player CFR
- An Introduction to Counterfactual Regret Minimization: Detailed overview of CFR with a Java implementation
- Multiplayer CFR: Multiplayer support in Python.