In SAC.py Line 120
|
_, z, action = self.produce_action_and_action_info(state) |
However, the output of
produce_action_and_action_info(state)
is
|
return action, log_prob, torch.tanh(mean) |
So, even though SAC algorithm can work in practice, is it a mistake?