Ask a few questions.

Question

N-Kingsley opened this issue 6 years ago · 4 comments

Hi,
I would like to ask some question as follow:

In 'compute_loss()' in DRQN.ipynb:

First. diff = (expected_q_values - current_q_values) :
Why the error needs to calculate every step in GRU but not last step?

Second, Why to do 'loss = self.huber(diff)'?

Third, Why to mask first half of losses?

Thanks,
Ni

Answer 1 · 2018-11-12T08:25:14.000Z

Why is there 6 actions in ‘PongNoFrameSkip-v4’ environment?

Answer 2 · 2018-11-13T04:38:28.000Z

Hi,

I'll try to address your questions in order:

I'm not sure I understand this one. Both expected_q_values and current_q_values have dimensionality [batch_size x sequence_length], where expected_q_values[:, -1] and current_q_values[:, -1] would be the most recent timestep
Good question, I didn't replicate the original paper exactly, which uses mean squared error. Instead, I used the huber loss function (https://en.wikipedia.org/wiki/Huber_loss).
Another good question. Once again, I did not replicate the original paper exactly. Masking the first half of the losses is a trick introduced in (https://arxiv.org/abs/1609.05521). In summary, the hidden state used to calculate the first few q -values of the sequence is unlikely to be correct since the hidden state is initialized to zero for each sequence; empirically it improved performance.

Answer 3 · 2018-11-13T04:39:23.000Z

To address your followup question, see https://github.com/openai/gym/wiki/Table-of-environments for more info on openai gym environments

Answer 4 · 2018-11-27T03:26:37.000Z

Thank you for your help, I have get them.