Agent fails with sequential data
mmansky-3 opened this issue · 0 comments
The Agent implementation fails for data of indeterminate length, such as temporal data. An environment that outputs data of the shape (None, data_dim) fails for an accompanying model with a fitting LSTM as first layer.
It appears that either the Agent or the standard Processor adds another dimension to the observation, causing a shape mismatch between the Environment output and the Model input, raising a ValueError. The shape received from an Environment that outputs (None, 10) is:
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [1, 1, None, 10]
The first "1" refers to the batch dimension and is to be expected. As an immediate workaround, one can add a squeeze layer to the model, something along the lines of Input>Squeeze>LSTM>Output.
-
Check that you are up-to-date with the master branch of Keras-RL. You can update with:
pip install git+git://github.com/wau/keras-rl2.git --upgrade --no-deps
-
Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
-
Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short). If you report an error, please include the error message and the backtrace.
Example Code:
import rl.memory
import rl.agents
import rl.core
import tensorflow as tf
import numpy as np
BATCH_SIZE = 1
DATA_DIM = 10
class Environment(rl.core.Env):
def __init__(self, data_dim = 10, game_length = 50):
self.reward_counter = 0
self.data_dim = data_dim
self.game_length = game_length
self.reward = 0.1
self.observation = [[0] * self.data_dim]
self.observation[0][0] = 1
self.done = False
def step(self, action):
action_number = np.argmax(action)
if not self.reward_counter + action_number % self.data_dim or np.random.rand() < 0.05:
self.reward *= 1.1
self.observation.append([0]*self.data_dim)
self.observation[-1][self.reward_counter%self.data_dim] = 1
self.reward_counter += 1
reward = self.reward
observation = self.observation
observation = np.array(observation)
if len(self.observation) > self.game_length and np.random.rand() < 0.05:
self.done = True
done = self.done
info = {}
return observation, reward, done, info
def reset(self):
self.done = False
self.reward_counter = 0
self.reward = 0.1
self.observation = [[0] * self.data_dim]
self.observation[0][0] = 1
observation = self.observation
observation = np.array(observation)
return observation
def close(self):
self.__del__()
if __name__ is '__main__':
lstm_input = tf.keras.Input(batch_shape = (BATCH_SIZE, 1, None, DATA_DIM))
# lstm_input = tf.keras.backend.squeeze(lstm_input, 1) # uncomment squeeze layer to fix model.
x = tf.keras.layers.LSTM(20)(lstm_input)
x = tf.keras.layers.Dense(10, activation='softmax')(x) # output size doesn't actually matter here
model = tf.keras.Model(inputs = [lstm_input], outputs = [x])
memory = rl.memory.SequentialMemory(50000, window_length=BATCH_SIZE)
processor = rl.core.Processor()
agent = rl.agents.DQNAgent(model, memory=memory, processor=processor, nb_actions=10, batch_size=BATCH_SIZE)
agent.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.01))
env = Environment(data_dim=DATA_DIM)
agent.fit(env, nb_steps=int(5e5), log_interval=1000)