inoryy/tensorflow2-deep-reinforcement-learning

The member value_c and entropy_c in A2CAgent

Closed this issue · 4 comments

I can see the comment of the 2 member values: coefficients are used for the loss terms.
I can see they are used when calculating the loss values. What's the purpose of the 2 values and how they are set? The blog article seems didn't mention them.

Hello,
They are scaling coefficients and can be treated as hyperparameters. Value is often set to 0.5 to match with MSE loss derivative. Entropy should be low enough to only slightly nudge policy in the uniform direction, but not interfere with it.

Thanks for the explanation and I have a rough understanding now. Is there any recommended documentation about them? Looks they are not widely used and It's the first time I see someone mentions them.

I don't know of a resource where it's explicitly described. Hyperparameter choice is often more art than science, usually people pick what others have in the past as a baseline and iterate over them with a sweep or even just manual perturbations.

OK, thanks all the same!