- sidepot
Gym is a toolkit for developing and comparing reinforcement learning algorithms.
Gym 提供了一個測試 Reinforcement Learning的環境框架, 並且沒有對 Agent做任何假設
另外也可以參考 DeepMind 針對德州撲克的環境DeepMind Poker
這一個版本是從 wenkesj/holdem改寫,主要多增加了以下功能
- 修改因 openai/gym 在 commit #836 spec change所造成的 crash
- 新增
cycle
attribute (發牌一輪為 round, 玩一次為 cycle) - 新增 Interface連接 Trend Micro server
- 新增 agent template (必須提供兩個 method讓 controller呼叫 (controller為溝通 environment與 agent的橋梁)
- 修改因 全部 player hold所造成的 crash
- 限制每一 round 同一 player raise 次數上限為 4次 (可透過參數修改) (自動改成 CALL)
- 修改 to_call為 此 round絕對數值
另外有改寫 ihendley/treys,這一個repo是改寫自 */deuce為提供 poker相關計算與管理
- 修改 f-string not supported under python 3.6
- 修改 serup.py在 windows paltform造成的 encoding issue (cp950 decode error)
- 局(Episode): 一局遊戲是指參賽的10名玩家進入一張遊戲桌, 持有相等的初始籌碼, 對戰直到本遊戲桌只剩下 不多於半數 玩家勝出, 而其餘玩家因籌碼耗盡而出局為止, 一局由多圈構成
- 圈(): 一圈是指 dealer button 圍繞牌桌在每個未出局的玩家手上都出現一次為止為一圈,一圈由多輪構成,在一圈過程中大小盲注數額相等,但是下一圈開始時大小盲注翻倍
- 輪(CYCLE): 一輪遊戲是指每次荷官重新發公共牌和私有牌, 每個玩家按回合進行決策, 直到除了一名玩家之外全部棄牌, 或者5張公共牌完全翻開為止, 決出本輪勝負並清算籌碼, 一輪由多個回合構成
- 回合(ROUND)): 一回合是指所有玩家依次take action, 稱為一回合, 一個回合中有多個(分別來自各個玩家的)action
- action(STEP): 一個action是指輪到某一個玩家 call/raise/check/fold/
bet/allin玩家通過 AI 客戶端完成其中一種決策稱之為一個action
# better run under virtualenv
git clone https://github.com/chuchuhao/holdem.git
pip install gym
pip install websocket-client
pip install git+https://github.com/chuchuhao/treys # 若非 windows環境可以直接 pip install treys
- local_example: environment為 gym
- web_example: enviroment為 Trend Micro Server
Agent必須為一個 class並且提供下面兩個 method
takeAction(self, state, playerid)
return ACTION (namedtupled)getReload(self, state)
reutrn {True/ False}
- 將下面會介紹的 State Tuple拿出 verorized state 餵給 model吃
- 可以利用 env利用指定 policy來生成 training data做 batch learning
- 可以利用 env並搭配 RL algorithm來做 online training
- 自行撰寫 expert rule
- 可以利用 env來觀察 model行為
- 連上 TM Server做測試
agent會接到一個由 namedtuple所包成的 state, 包含下列三個項目
-
player_states: 長度為此桌
seat
座位數的 Tuple- 每一個 item為 player_sates:
emptyplayer
, (boolean), 0 seat is empty, 1 is not 表示這一個位子有沒有玩家註冊seat
, (number), 玩家的 seat number, seat也是 玩家的初始順序, 過程中不會改變stack
, (number), 玩家剩餘籌碼playing_hand
, (boolean), 玩家目前有在玩此 cyclehandrank
, (number), 由 treys.Evaluator.evaluate(hand, community), 每一個 round結束後都會計算playedthisround
, (boolean), 玩家是否已經玩過此 round (1 cycle 有4 rounds)betting
, (number), 玩家在此 cycle已下注的金額isallin
, (boolean), 0 not all in, 1 all inlastsidepot
, (number), resolve when someone all in 目前 sidepot相關功能都沒有使用reloadCount
, (number), 在 openai/gym中沒有適用到 reload功能hand
, (list(2)), 長度為 2的 必須使用 TREYS提供的 API解讀
- 每一個 item為 player_sates:
-
community_state: 這裡所提到的 id = seat number
button
, (number), the id of bigblind 莊家位置 (順序: 莊家> 小盲 > 大盲)smallblind
, (number), the current small blind amount 小盲注籌碼數bigblind
, (number), the current big blind amount 大盲注籌碼數totalpot
, (number), the current total amount in the community pot 所有人下注的總籌碼數lastraise
, (number), the last posted raise amount 最後一個 raise的人 增加的籌碼數call_price
, (number), 此 round要 call的絕對籌碼數to_call
, (number), 此 round要 call的相對籌碼數 (絕對籌碼數 - 此 round已出籌碼數)current_player
, (id), the id of current player 目前決策的玩家 id
- community_cards: 長度為 5的 list, 每一個 item唯一張卡
- card -1時代表牌面未公布, 另外 card的值為一 Number須由 TREY解讀
agent要做出一個 action的時候, 必須丟出一個 ACTION的 namedtuple
action
: 由 action_table() class提供, 這裡與 TM提供的 interface稍有不同, 但會自動轉換action_table.CHECK
歲月靜好 (不下注)action_table.CALL
跟 (下注, 但不指定金額)action_table.RAISE
提高開殺 (下注, 指定親俄)action_table.FOLD
放棄 GG (不下注放棄)
amount
: 當地一個選額為 RAISE時, 會查看此一項目
pip install holdem
Afaik, this is the first OpenAI Gym No-Limit Texas Hold'em* (NLTH) environment written in Python. It's an experiment to build a Gym environment that is synchronous and can support any number of players but also appeal to the general public that wants to learn how to "solve" NLTH.
*Python 3 supports arbitrary length integers 💸
Right now, this is a work in progress, but I believe the API is mature enough for some preliminary experiments. Join me in making some interesting progress on multi-agent Gym environments.
There is limited documentation at the moment. I'll try to make this less painful to understand.
Creates a gym environment representation a NLTH Table from the parameters:
n_seats
- number of available players for the current table. No players are initially allocated to the table. You must callenv.add_player(seat_id, ...)
to populate the table.max_limit
- max_limit is used to define thegym.spaces
API for the class. It does not actually determine any NLTH limits; in support ofgym.spaces.Discrete
.debug
- add debug statements to play, will probably be removed in the future.
Adds a player to the table according to the specified seat (seat_id
) and the initial amount of
chips allocated to the player's stack
. If the table does not have enough seats according to the
n_seats
used by the constructor, a gym.error.Error
will be raised.
Calling env.reset
resets the NLTH table to a new hand state. It does not reset any of the players
stacks, or, reset any of the blinds. New behavior is reserved for a special, future portion of the
API that is yet another feature that is not standard in Gym environments and is a work in progress.
The observation returned is a tuple
of the following by index:
player_states
- atuple
where each entry istuple(player_info, player_hand)
, this feature can be used to gather all states and hands by(player_infos, player_hands) = zip(*player_states)
.player_infos
- is alist
ofint
features describing the individual player. It contains the following by index: 0.[0, 1]
-0
- seat is empty,1
- seat is not empty.[0, n_seats - 1]
- player's id, where they are sitting.[0, inf]
- player's current stack.[0, 1]
- player is playing the current hand.[0, inf]
the player's current handrank according totreys.Evaluator.evaluate(hand, community)
.[0, 1]
-0
- player has not played this round,1
- player has played this round.[0, 1]
-0
- player is currently not betting,1
- player is betting.[0, 1]
-0
- player is currently not all-in,1
- player is all-in.[0, inf]
- player's last sidepot.
player_hands
- is alist
ofint
features describing the cards in the player's pocket. The values are encoded based on thetreys.Card
integer representation.
community_states
- atuple(community_infos, community_cards)
where:community_infos
- alist
by index: 0.[0, n_seats - 1]
- location of the dealer button, where big blind is posted.[0, inf]
- the current small blind amount.[0, inf]
- the current big blind amount.[0, inf]
- the current total amount in the community pot.[0, inf]
- the last posted raise amount.[0, inf]
- minimum required raise amount, if above 0.[0, inf]
- the amount required to call.[0, n_seats - 1]
- the current player required to take an action.
community_cards
- is alist
ofint
features describing the cards in the community. The values are encoded based on thetreys.Card
integer representation. There are 5int
in the list, where-1
represents that there is no card present.
import gym
import holdem
def play_out_hand(env, n_seats):
# reset environment, gather relevant observations
(player_states, (community_infos, community_cards)) = env.reset()
(player_infos, player_hands) = zip(*player_states)
# display the table, cards and all
env.render(mode='human')
terminal = False
while not terminal:
# play safe actions, check when noone else has raised, call when raised.
actions = holdem.safe_actions(community_infos, n_seats=n_seats)
(player_states, (community_infos, community_cards)), rews, terminal, info = env.step(actions)
env.render(mode='human')
env = gym.make('TexasHoldem-v1') # holdem.TexasHoldemEnv(2)
# start with 2 players
env.add_player(0, stack=2000) # add a player to seat 0 with 2000 "chips"
env.add_player(1, stack=2000) # add another player to seat 1 with 2000 "chips"
# play out a hand
play_out_hand(env, env.n_seats)
# add one more player
env.add_player(2, stack=2000) # add another player to seat 1 with 2000 "chips"
# play out another hand
play_out_hand(env, env.n_seats)