Question about rescale in cpo
YUYING07 opened this issue · 1 comments
YUYING07 commented
b_flat = get_flat_gradients_from(self.ac.pi.net)
ep_costs = self.logger.get_stats('EpCosts')[0]
c = ep_costs - self.cost_limit
c /= (self.logger.get_stats('EpLen')[0] + eps) # rescale
self.logger.log(f'c = {c}')
self.logger.log(f'b^T b = {b_flat.dot(b_flat).item()}')
Why we needs to rescale here?
zmsn-2077 commented
I am very sorry to be so late in replying to you. I understand that this is a regularisation to remove the effect of the length of the trajectory, and then the openai implementation https://github.com/openai/safety-starter-agents/blob/master/safe_rl/pg/agents.py
uses a similar operation.
Again, I apologize for not replying in time, but if you have any further questions I will reply as soon as I can.