Possible bug in the calculation of the state space
Opened this issue · 1 comments
It seems that you assume that the state space has 8 * 8 * 2
states. We have a 2 * 4 = 8
grid, so one might think that there are 8 ways to place A, 8 ways to place B, and then there are 2 ways to place the ball. However, if you use this approach, you assume that A, B and the ball can be placed in the same cell. In the original Littman's paper (minimax Q-learning), this is not the case. A, B and the ball must always be in different cells - so, in that case, the correct number of states is 8 * 7 * 2 = 112
.
I'm trying to understand why you defined the state space like self.state_space = (8, 8, 2)
, which seems to suggest that
- you calculated the state space wrongly, or
- maybe I am misinterpreting what the variable
self.state_space
is supposed to represent. - you allow players to be in the same cell
You write in the comments like this self.state_space: <num of variable1, num of variable2, num of variable3>
, but this is unclear to me. You use state_space
to define the Q-functions in the agent. Clearly, these should be represented as multi-dimensional arrays, such that each entry in the array corresponds to a tuple (a1, a2, state)
, so I think that makes sense.
Could you please clarify what is your approach to define the state space, and how does that affect e.g. the definition of the Q-function and its shape?
After having looked more at your code, it really seems that you assumed that there are 8 * 8 * 2
states, when defining the arrays for the value functions. However, you also never allow players to be in the same position (when actions are taken). So, effectively, there aren't 8 * 8 * 2
states, but 8 * 7 * 2
, like I said. I don't think that this affects learning, but it definitely affects memory (i.e. your arrays are necessarily bigger than they should be).