maybe a small bug in the function `explore_vec_env` of discretePPO and discreteA2C?
Opened this issue · 0 comments
DranZohn commented
in the function explore_vec_env
of AgentPPO
, the variable actions
shaped with [horizon_len, self.num_envs, 1]
, but the following expression convert(action)
return the tensor with the 1-dim shape num_envs
, which actually should be [num_envs, 1]
as it works in explore_vec_env
of AgentD3QN
. And it indeed faild the demoexamples/demo_A2C_PPO.py
.
Folloiwing change works for me:
# ActorDiscretePPO of net.py
def get_action(self, state: Tensor) -> (Tensor, Tensor):
state = self.state_norm(state)
a_prob = self.soft_max(self.net(state))
a_dist = self.ActionDist(a_prob)
action = a_dist.sample()
logprob = a_dist.log_prob(action)
return action.unsqueeze(1), logprob # unsqueeze the action