PKU-Alignment/Safe-Policy-Optimization

Question about rescale in cpo

YUYING07 opened this issue · 1 comments

    b_flat = get_flat_gradients_from(self.ac.pi.net)

    ep_costs = self.logger.get_stats('EpCosts')[0]
    c = ep_costs - self.cost_limit
    c /= (self.logger.get_stats('EpLen')[0] + eps)  # rescale
    self.logger.log(f'c = {c}')
    self.logger.log(f'b^T b = {b_flat.dot(b_flat).item()}')

Why we needs to rescale here?

I am very sorry to be so late in replying to you. I understand that this is a regularisation to remove the effect of the length of the trajectory, and then the openai implementation https://github.com/openai/safety-starter-agents/blob/master/safe_rl/pg/agents.py uses a similar operation.
Again, I apologize for not replying in time, but if you have any further questions I will reply as soon as I can.