One policy is [-inf, nan, nan, inf] when using vertex_enumeration
Closed this issue · 2 comments
I just come across a matrix that come across inf and nan result when solving
The game matrix is:
[[ -4.74849287 -1.41955836 -1.41955836 -2.46551608]
[-100. -0.80877032 -0.80877032 -1.42923881]
[ -3.99960399 -1.41955836 -1.41955836 -1.42923881]
[ -2.46551608 -0.80877032 -0.80877032 -2.46551608]]
and this can not be solved using support_enumeration, and using vertex_enumeration, it will return
one solution that policy is [-inf, nan, nan, inf] and the reward is nan, could you tell me why?
I will investigate and get back to you. Thanks for posting the issue.
@alexalvis the game is degenerate: the column play has two best responses to the first row, the algorithms implemented are not guaranteed to work for degenerate games.