Wrong array dimension/index in tdZeroApprox.py
vdouet opened this issue · 0 comments
Hi,
For the code for TD(0) approximation 'tdZeroApprox.py' in Unit-8-The-Mountaincar.
Line 19 when the numpy array 'tiledState' is created:
tiledState = np.zeros(nTiles*nTiles*nTiles)
Shouldn't it be:
tiledState = np.zeros((nTiles - 1)*(nTiles - 1)*nLayers)
Because if the number of layers is different from the number of tiles then the "tiledState" array will not have the right dimension.
Also, here nTiles = nBins = 8. So if we have 8 bins per axis we have 7 actual tiles per axis and np.digitize
will return a number between 1 and 7 (because the if condition is strictly inferior/superior) so a total of 49 tiles.
Furthermore, I think idx
should be idx = x * y + row * (nTiles-1)**2 - 1
because of the fact that np.digitize
returns values between [1, 7] and not between [0, 7]. The values will then be for each row [0, 48], [49, 97], [98, 146], etc. Because with the original idx: idx = (x + 1) * (y + 1) + row * nTiles**2 - 1
there will be some index values that can never be used in tiledState (for example, it will always start at index n°3: (1+1) * (1+1) + 0 - 1 = 3).
I think it is because in the slide when you introduce the index equation during the chapter "Linear methods and tiling" the first possible value for x and y is (x,y) = (0,0). But the first value possible with np.digitize
is (x,y) = (1,1). If we want the index equation to still be true we should maybe write:
x = np.digitize(position, pos_bins[row]) - 1
y = np.digitize(velocity, vel_bins[row]) - 1
and then we can write:
idx = (x + 1) * (y + 1) + (row * n_tiles**2) - 1
But here is the modified code I used:
def tile_state(pos_bins, vel_bins, obs, n_bins=8, n_layers=8):
position, velocity = obs
# The number of tiles per axis is the number of bins per axis - 1
n_tiles = n_bins - 1
tiled_state = np.zeros(n_tiles * n_tiles * n_layers)
for row in range(n_layers):
if position > pos_bins[row][0] and \
position < pos_bins[row][n_bins - 1]:
if velocity > vel_bins[row][0] and \
velocity < vel_bins[row][n_bins - 1]:
x = np.digitize(position, pos_bins[row])
y = np.digitize(velocity, vel_bins[row])
idx = (x * y) + (row * n_tiles**2) - 1
tiled_state[idx] = 1.0
else:
break
else:
break
return tiled_state
And I find the following result:
But maybe there is something I did not understand?
Best regards,
Victor Douet