I created a custom player class to create an agent that can complete a game against itself and the agents provided as a baseline by the assignment. In the score function, I use a combination of techniques covered by the lectures to predict the best move for the agent to play next. Each method returns a value used to rank all possible moves against each other. With experimentation, I was able to find which methods worked well and which methods didn't work well at all.
own_loc = state.locs[self.player_id]
own_liberties = set(state.liberties(own_loc))
own_x, own_y = self.ind2xy(own_loc)
opp_loc = state.locs[1 - self.player_id]
opp_liberties = set(state.liberties(opp_loc))
shared_actions = own_liberties & opp_liberties
actions = own_liberties - (opp_liberties - own_liberties)
steal_liberties = own_liberties & opp_liberties
if self.player_id == 0:
if own_loc == 57 and opp_loc == None:
return float('inf')
if opp_loc and self.player_id == 0:
mirrors = self.get_mirrors(opp_loc)
if own_loc in mirrors:
return float('inf')
if len(shared_actions) > 0:
score = score + len(shared_actions)
else:
score = score + len(actions)
return score
Return the move along a branch of the game tree that has the best possible value. A move is a pair of coordinates in (column, row) order corresponding to a legal move for the searching player.
def alpha_beta_search(self, state):
alpha = float("-inf")
beta = float("inf")
best_score = float("-inf")
best_move = None
for a in state.actions():
v = self.min_value(state.result(a), alpha, beta)
alpha = max(alpha, v)
if v > best_score:
best_score = v
best_move = a
return best_move
Return the value for a win (+1) if the game is over, otherwise return the minimum value over all legal child nodes.
def min_value(self, state, alpha, beta):
if state.terminal_test():
return state.utility(0)
v = float("inf")
for a in state.actions():
v = min(v, self.max_value(state.result(a), alpha, beta))
if v <= alpha:
return v
beta = min(beta, v)
return v
Return the value for a loss (-1) if the game is over, otherwise return the maximum value over all legal child nodes.
def max_value(self, state, alpha, beta):
if state.terminal_test():
return state.utility(0)
v = float("-inf")
for a in state.actions():
v = max(v, self.min_value(state.result(a), alpha, beta))
if v >= beta:
return v
alpha = max(alpha, v)
return v
Part of my assignment was to test the effectiveness of my search techniques against the provided agents. Each agent uses a strategy covered in class to try and win the game. My task was to find additional strategies that would ultimately win the game, or determine why those strategies are ineffective.
Below is a table showing the average % win after 1000 games played, once as Player 1 and once as Player 2. My agent was particularly effective when playing against the GREEDY algorithm by choosing moves that reduce the amount of moves available to the opponent. My solution also did well playing against the agent using random moves, winning 82.6% as Player 1 and 79.3% as Player 2.
Player 1 | Player 2 | |
---|---|---|
MINIMAX | 39.5% | 24.1% |
GREEDY | 100% | 100% |
SELF | 50% | 50% |
RANDOM | 82.6% | 79.3% |