tambetm/simple_dqn

What's the reason to add StateBuffer?

mw66 opened this issue · 9 comments

mw66 commented

I just saw this in the new code,

https://github.com/tambetm/simple_dqn/blob/master/src/agent.py#L12

and wonder why add StateBuffer?

It stores the duplicated data (screen) in replay_memory, maybe only to make getCurrentState() easier?

I think that's too small a reason to introduce a new class, and store the same data twice in different structures.

The main reason was that when using ReplayMemory to keep track of current state, the states visited during testing also make it to replay memory and I don't think that's the correct thing to do. Basically we are allowing test data to leak into training. I'm not sure how it actually affects the result though. I'm assuming that previous approach made results slightly better, because during testing we're using exploration rate 0.05, which might result in slightly better trajectories in replay memory.

mw66 commented

Thanks for the explanation. Maybe you can put this as comments
of StateBuffer class? to clarify if other people have same confusion as me.

BTW, I have another question: I plan to change the original DQN for some
experiment, like this;

path1 = original_DQN
path2 = some_extra_info

layers = [MergeMultistream(layers=[path1, path2], merge="stack"),
Affine(nout=num_actions, init = init_norm)]

self.model = Model(layers=layers)

My question is how to prepare the input data in self._setInput(states) ?

currently, states.shape == (self.batch_size, self.history_length,
screen_height, screen_width)

I think after my modification, the input data will be a tuple (screen,
extra_info)

where screen.shape = self.screen_dim
extra_info.shape is arbitrary, which I haven't decided yet (maybe as simple
as a scalar), but will be unrelated to the screen.shape

  1. how to write a numpy array (self.batch_size, self.history_length) of
    tuples (i.e. (screen, extra_info))?
  2. if I pass such a state to self._setInput, will Neon work (for such array
    of tuples)?

Thanks!

On Sun, Jun 26, 2016 at 6:21 AM, Tambet Matiisen notifications@github.com
wrote:

Closed #22 #22.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#22 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ACArXn86T7pB1LWaMBut1dGA5a5OT3vYks5qPnzLgaJpZM4I-OpZ
.

mw66 commented

How to change this:

https://github.com/tambetm/simple_dqn/blob/master/src/deepqnetwork.py#L37

self.screen_input_shape = (self.history_length,) + self.screen_dim + (self.batch_size,)
self.extra_input_shape = (self.history_length,) + (N,) + (self.batch_size,)
self.input = (self.be.empty(self.screen_input_shape), self.be.empty(self.extra_input_shape))
self.input.lshape = (self.screen_input_shape,  self.extra_input_shape) # HACK: needed for convolutional networks

the input will be a tuple, and self.input.lshape will be tuple of tuples, will this work?

mw66 commented

Ahh, I realized that if I separate two inputs by using two variables, it may work:

self.screen_input = ...
self.extra_input = ...

comments?

mw66 commented

And call fprop with the tuple

self.model.fprop((self.screen_input, self.extra_input))

as in

https://github.com/NervanaSystems/neon/blob/master/neon/data/imagecaption.py#L217

mw66 commented

https://github.com/tambetm/simple_dqn/blob/master/src/deepqnetwork.py#L49

how to change this then:

self.model.initialize(self.input_shape[:-1], self.cost)

I read the doc:

http://neon.nervanasys.com/docs/latest/_modules/neon/models/model.html#Model.initialize
def initialize(self, dataset, cost=None):

it suppose to get a dataset, is this a bug?

mw66 commented

should it be:

self.model.initialize(self.input[:-1], self.cost)

if this is a bug, in my experiment, can I change it to:

self.model.initialize((self.screen_input[:-1], self.extra_input[:-1]), self.cost)

i.e. pass a tuple as dataset, will it work as expected?

Added comment for StateBuffer. For the other issues I think you should refer to Neon.