tencent-ailab/TLeague

What does this code do in pg_learner.py

Passerby opened this issue · 2 comments

https://github.com/tencent-ailab/TLeague/blob/dev-open/tleague/learners/pg_learner.py#L152

Why need to add another optimizer to gd the value loss?
What's the burn_in mean?

https://github.com/tencent-ailab/TLeague/blob/dev-open/tleague/learners/pg_learner.py#L152

Why need to add another optimizer to gd the value loss?
What's the burn_in mean?

When you burn-in, you are pre-training the value function with policy fixed (by stopping policy gradient if your policy shares parameters with the value). The term 'burn-in' is originally from [1]. This is useful when you prefer to start policy gradient with an accurate critic. For example, when the policy starts with a supervised model and, however, a blank value function, the policy might be damaged within the first few steps until the value becomes accurate.

[1] Jaderberg M, Czarnecki W M, Dunning I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859-865.

thanks