Question about logger value
lijie9527 opened this issue · 2 comments
lijie9527 commented
if done or time_out:
rew_deque.append(ep_ret[idx])
cost_deque.append(ep_cost[idx])
len_deque.append(ep_len[idx])
logger.store(
**{
"Metrics/EpRet": np.mean(rew_deque),
"Metrics/EpCost": np.mean(cost_deque),
"Metrics/EpLen": np.mean(len_deque),
}
)
ep_ret[idx] = 0.0
ep_cost[idx] = 0.0
ep_len[idx] = 0.0
I'm confused about this np.mean(cost_deque), this would make the EpCost for different epochs correlate, making ep_costs = logger.get_stats("Metrics/EpCost") different from the Jc definition in the safe RL paper. The purpose of this is to utilize the data from many previous episodes to average out the newly added data, which helps improve training stability, and will make the drawn training curve look smoother?
Gaiejj commented
Yes. Lagrangian methods often lack stability and may oscillate drastically with the changes in EpCost. Updating the Lagrange multipliers across epochs with EpCost can enhance the stability of the algorithm and also make the training curve smoother.
lijie9527 commented
Thanks for the quick reply, I get it