simoninithomas/Deep_reinforcement_learning_Course

The old and the new model is effectively the same?

yuan1202 opened this issue · 0 comments

Hi Simon

I am looking at your implementation of the PPO model.

After going through the code a couple of times I think in the implementation, although you created two policy instances, because of the re-use parameter is passed in the second instance, you effectively have the two identical policies in your model.

Furthermore I have not seen code that is used to transfer the weights between two policies, unlike OpenAI's implementation, in which they did this:
'''Python
assign_old_eq_new = U.function([],[], updates=[tf.assign(oldv, newv)
for (oldv, newv) in zipsame(oldpi.get_variables(), pi.get_variables())])
'''

Therefore could you please confirm this is indeed the case. Thanks!