Ask a few questions.
N-Kingsley opened this issue · 4 comments
Hi,
I would like to ask some question as follow:
In 'compute_loss()' in DRQN.ipynb:
First. diff = (expected_q_values - current_q_values) :
Why the error needs to calculate every step in GRU but not last step?
Second, Why to do 'loss = self.huber(diff)'?
Third, Why to mask first half of losses?
Thanks,
Ni
Why is there 6 actions in ‘PongNoFrameSkip-v4’ environment?
Hi,
I'll try to address your questions in order:
-
I'm not sure I understand this one. Both expected_q_values and current_q_values have dimensionality [batch_size x sequence_length], where expected_q_values[:, -1] and current_q_values[:, -1] would be the most recent timestep
-
Good question, I didn't replicate the original paper exactly, which uses mean squared error. Instead, I used the huber loss function (https://en.wikipedia.org/wiki/Huber_loss).
-
Another good question. Once again, I did not replicate the original paper exactly. Masking the first half of the losses is a trick introduced in (https://arxiv.org/abs/1609.05521). In summary, the hidden state used to calculate the first few q -values of the sequence is unlikely to be correct since the hidden state is initialized to zero for each sequence; empirically it improved performance.
To address your followup question, see https://github.com/openai/gym/wiki/Table-of-environments for more info on openai gym environments
Thank you for your help, I have get them.