pat-coady/trpo

Change of the loss function

bernardocortez opened this issue · 0 comments

Hello! Congratulations for the excellent implementation.
I noticed some differences between your policy nn loss function and the one of the original paper. What criteria did you follow to make such changes?