TRPO,Is fixed_log_probs the same as log_probs

Question

TRPO,Is fixed_log_probs the same as log_probs

yongpan0715 opened this issue 2 years ago · 1 comments

in trpo,Is fixed_log_probs the same as log_probs? In the program debugging process, the output of the two is the same, there is no difference between pnew and pold?

 with torch.no_grad():
        fixed_log_probs = policy_net.get_log_prob(states, actions)
    """define the loss function for TRPO"""
    def get_loss(volatile=False):
        with torch.set_grad_enabled(not volatile):
            log_probs = policy_net.get_log_prob(states, actions)
            action_loss = -advantages * torch.exp(log_probs - fixed_log_probs)

Answer 1 · 2022-04-20T16:51:19.000Z

Please refer to #21 and #11