qfettes/DeepRL-Tutorials

Ask a few questions.

N-Kingsley opened this issue · 4 comments

Hi,
I would like to ask some question as follow:

In 'compute_loss()' in DRQN.ipynb:

First. diff = (expected_q_values - current_q_values) :
Why the error needs to calculate every step in GRU but not last step?

Second, Why to do 'loss = self.huber(diff)'?

Third, Why to mask first half of losses?

Thanks,
Ni

Why is there 6 actions in ‘PongNoFrameSkip-v4’ environment?

Hi,

I'll try to address your questions in order:

  1. I'm not sure I understand this one. Both expected_q_values and current_q_values have dimensionality [batch_size x sequence_length], where expected_q_values[:, -1] and current_q_values[:, -1] would be the most recent timestep

  2. Good question, I didn't replicate the original paper exactly, which uses mean squared error. Instead, I used the huber loss function (https://en.wikipedia.org/wiki/Huber_loss).

  3. Another good question. Once again, I did not replicate the original paper exactly. Masking the first half of the losses is a trick introduced in (https://arxiv.org/abs/1609.05521). In summary, the hidden state used to calculate the first few q -values of the sequence is unlikely to be correct since the hidden state is initialized to zero for each sequence; empirically it improved performance.

To address your followup question, see https://github.com/openai/gym/wiki/Table-of-environments for more info on openai gym environments

Thank you for your help, I have get them.