Test new DLAC and LSAC architectures
rickstaa opened this issue · 2 comments
In this issue, the results of two new architectures DLAC and LSAC are compared with the original LAC algorithm. To do this I will use the oscillator environment. I will also set the Environment and Algorithm seeds to 0.
LAC results
The original LAC algorithm gives the following results:
DLAC results
In the double-Lypaunov Actor-Critic, two Lyapunov critics are used instead of one. Following the maximum L, value is used for calculating the actor loss. This is similar to the double-Q trick that is used in the original SAC algorithm.
In the current from the double-Lyapunov Soft actor, the critic is not able to train. I, however, think this is due to an error in the implementation. I will postpone researching this architecture after the Pytorch version is fully ready as in there it is easier to debug.
In the Lyapunov Soft Actor-Critic (Couldn't think of a name) contains both a Lyapunov critic and a normal soft critic. Following the results of both these critics are combined in the loss function for the policy:
LSAC automatic temperature tuning
Now let's add an additional Lagrance multiplier for the contribution of the Q networks.
Sigma direction investigation
When I implemented the temperature variable (sigma) for the value component of the actor loss function I noticed this Lagrange multiplier (sigma) sometimes increase and sometimes decreases.
Following lets minimize the following equality constraint:
Sigma increases
Sigma decreases
Closed as this is not on the immediate roadmap.