Possible bug in the calculation of the state space

Question

Possible bug in the calculation of the state space

Opened this issue 2 years ago · 1 comments

It seems that you assume that the state space has 8 * 8 * 2 states. We have a 2 * 4 = 8 grid, so one might think that there are 8 ways to place A, 8 ways to place B, and then there are 2 ways to place the ball. However, if you use this approach, you assume that A, B and the ball can be placed in the same cell. In the original Littman's paper (minimax Q-learning), this is not the case. A, B and the ball must always be in different cells - so, in that case, the correct number of states is 8 * 7 * 2 = 112.

I'm trying to understand why you defined the state space like self.state_space = (8, 8, 2), which seems to suggest that

you calculated the state space wrongly, or
maybe I am misinterpreting what the variable self.state_space is supposed to represent.
you allow players to be in the same cell

You write in the comments like this self.state_space: <num of variable1, num of variable2, num of variable3>, but this is unclear to me. You use state_space to define the Q-functions in the agent. Clearly, these should be represented as multi-dimensional arrays, such that each entry in the array corresponds to a tuple (a1, a2, state), so I think that makes sense.

Could you please clarify what is your approach to define the state space, and how does that affect e.g. the definition of the Q-function and its shape?

Answer 1 · 2022-05-30T09:19:24.000Z

After having looked more at your code, it really seems that you assumed that there are 8 * 8 * 2 states, when defining the arrays for the value functions. However, you also never allow players to be in the same position (when actions are taken). So, effectively, there aren't 8 * 8 * 2 states, but 8 * 7 * 2, like I said. I don't think that this affects learning, but it definitely affects memory (i.e. your arrays are necessarily bigger than they should be).