One policy is [-inf, nan, nan, inf] when using vertex_enumeration

Question

One policy is [-inf, nan, nan, inf] when using vertex_enumeration

Closed this issue 5 years ago · 2 comments

I just come across a matrix that come across inf and nan result when solving
The game matrix is:
[[ -4.74849287 -1.41955836 -1.41955836 -2.46551608]
[-100. -0.80877032 -0.80877032 -1.42923881]
[ -3.99960399 -1.41955836 -1.41955836 -1.42923881]
[ -2.46551608 -0.80877032 -0.80877032 -2.46551608]]
and this can not be solved using support_enumeration, and using vertex_enumeration, it will return
one solution that policy is [-inf, nan, nan, inf] and the reward is nan, could you tell me why?

Answer 1 · 2019-10-21T05:52:38.000Z

I will investigate and get back to you. Thanks for posting the issue.

Answer 2 · 2019-11-18T07:44:34.000Z

@alexalvis the game is degenerate: the column play has two best responses to the first row, the algorithms implemented are not guaranteed to work for degenerate games.