What's the reason to add StateBuffer?

Question

What's the reason to add StateBuffer?

mw66 opened this issue 8 years ago · 9 comments

mw66 commented 8 years ago

I just saw this in the new code,

https://github.com/tambetm/simple_dqn/blob/master/src/agent.py#L12

and wonder why add StateBuffer?

It stores the duplicated data (screen) in replay_memory, maybe only to make getCurrentState() easier?

I think that's too small a reason to introduce a new class, and store the same data twice in different structures.

Answer 1 · 2016-06-26T13:21:11.000Z

The main reason was that when using ReplayMemory to keep track of current state, the states visited during testing also make it to replay memory and I don't think that's the correct thing to do. Basically we are allowing test data to leak into training. I'm not sure how it actually affects the result though. I'm assuming that previous approach made results slightly better, because during testing we're using exploration rate 0.05, which might result in slightly better trajectories in replay memory.

Answer 2 · 2016-06-26T16:11:33.000Z

Thanks for the explanation. Maybe you can put this as comments
of StateBuffer class? to clarify if other people have same confusion as me.

BTW, I have another question: I plan to change the original DQN for some
experiment, like this;

path1 = original_DQN
path2 = some_extra_info

layers = [MergeMultistream(layers=[path1, path2], merge="stack"),
Affine(nout=num_actions, init = init_norm)]

self.model = Model(layers=layers)

My question is how to prepare the input data in self._setInput(states) ?

currently, states.shape == (self.batch_size, self.history_length,
screen_height, screen_width)

I think after my modification, the input data will be a tuple (screen,
extra_info)

where screen.shape = self.screen_dim
extra_info.shape is arbitrary, which I haven't decided yet (maybe as simple
as a scalar), but will be unrelated to the screen.shape

how to write a numpy array (self.batch_size, self.history_length) of
tuples (i.e. (screen, extra_info))?
if I pass such a state to self._setInput, will Neon work (for such array
of tuples)?

Thanks!

On Sun, Jun 26, 2016 at 6:21 AM, Tambet Matiisen notifications@github.com
wrote:

Closed #22 #22.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#22 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ACArXn86T7pB1LWaMBut1dGA5a5OT3vYks5qPnzLgaJpZM4I-OpZ
.

Answer 3 · 2016-06-26T17:16:34.000Z

related question here:

https://groups.google.com/forum/#!topic/neon-users/hMTRAT4Hf4M

Answer 4 · 2016-06-26T17:28:38.000Z

How to change this:

https://github.com/tambetm/simple_dqn/blob/master/src/deepqnetwork.py#L37

self.screen_input_shape = (self.history_length,) + self.screen_dim + (self.batch_size,)
self.extra_input_shape = (self.history_length,) + (N,) + (self.batch_size,)
self.input = (self.be.empty(self.screen_input_shape), self.be.empty(self.extra_input_shape))
self.input.lshape = (self.screen_input_shape,  self.extra_input_shape) # HACK: needed for convolutional networks

the input will be a tuple, and self.input.lshape will be tuple of tuples, will this work?

Answer 5 · 2016-06-26T17:35:31.000Z

Ahh, I realized that if I separate two inputs by using two variables, it may work:

self.screen_input = ...
self.extra_input = ...

comments?

Answer 6 · 2016-06-26T17:41:06.000Z

And call fprop with the tuple

self.model.fprop((self.screen_input, self.extra_input))

as in

https://github.com/NervanaSystems/neon/blob/master/neon/data/imagecaption.py#L217

Answer 7 · 2016-06-26T18:29:05.000Z

https://github.com/tambetm/simple_dqn/blob/master/src/deepqnetwork.py#L49

how to change this then:

self.model.initialize(self.input_shape[:-1], self.cost)

I read the doc:

http://neon.nervanasys.com/docs/latest/_modules/neon/models/model.html#Model.initialize
def initialize(self, dataset, cost=None):

it suppose to get a dataset, is this a bug?

Answer 8 · 2016-06-26T18:38:07.000Z

should it be:

self.model.initialize(self.input[:-1], self.cost)

if this is a bug, in my experiment, can I change it to:

self.model.initialize((self.screen_input[:-1], self.extra_input[:-1]), self.cost)

i.e. pass a tuple as dataset, will it work as expected?

Answer 9 · 2016-06-27T08:24:02.000Z

Added comment for StateBuffer. For the other issues I think you should refer to Neon.