maybe a small bug in the function `explore_vec_env` of discretePPO and discreteA2C?

Question

maybe a small bug in the function `explore_vec_env` of discretePPO and discreteA2C?

Opened this issue 6 months ago · 0 comments

in the function explore_vec_env of AgentPPO, the variable actions shaped with [horizon_len, self.num_envs, 1], but the following expression convert(action) return the tensor with the 1-dim shape num_envs, which actually should be [num_envs, 1] as it works in explore_vec_env of AgentD3QN. And it indeed faild the demoexamples/demo_A2C_PPO.py.

Folloiwing change works for me:

# ActorDiscretePPO of net.py
  def get_action(self, state: Tensor) -> (Tensor, Tensor):
      state = self.state_norm(state)
      a_prob = self.soft_max(self.net(state))
      a_dist = self.ActionDist(a_prob)
      action = a_dist.sample()
      logprob = a_dist.log_prob(action)
      return action.unsqueeze(1), logprob  # unsqueeze the action