The old and the new model is effectively the same?
yuan1202 opened this issue · 0 comments
yuan1202 commented
Hi Simon
I am looking at your implementation of the PPO model.
After going through the code a couple of times I think in the implementation, although you created two policy instances, because of the re-use parameter is passed in the second instance, you effectively have the two identical policies in your model.
Furthermore I have not seen code that is used to transfer the weights between two policies, unlike OpenAI's implementation, in which they did this:
'''Python
assign_old_eq_new = U.function([],[], updates=[tf.assign(oldv, newv)
for (oldv, newv) in zipsame(oldpi.get_variables(), pi.get_variables())])
'''
Therefore could you please confirm this is indeed the case. Thanks!