example/demo_A2C_PPO.py中离散的例子报异常

Question

example/demo_A2C_PPO.py中离散的例子报异常

Opened this issue a year ago · 2 comments

执行
python demo_A2C_PPO.py --gpu=0 --drl=0 --env=6
出现异常

File "elegantrl/train/evaluator.py", line 176, in get_cumulative_rewards_and_steps
    tensor_action = tensor_action.argmax(dim=1)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

在这句下断点，并打印变量如下：

(Pdb) l
171         returns = 0.0  # sum of rewards in an episode
172         for steps in range(max_step):
173             tensor_state = torch.as_tensor(state, dtype=torch.float32, device=device).unsqueeze(0)
174             tensor_action = actor(tensor_state)
175             if if_discrete:
176 B->             tensor_action = tensor_action.argmax(dim=1)
177             action = tensor_action.detach().cpu().numpy()[0]  # not need detach(), because using torch.no_grad() outside
178             state, reward, done, _ = env.step(action)
179             returns += reward
180     
181             if if_render:
(Pdb) pp tensor_state
tensor([[ 0.0357, -0.0466,  0.0230, -0.0324]], device='cuda:0')
(Pdb) pp tensor_action
tensor([0], device='cuda:0')
(Pdb) pp actor
ActorDiscretePPO(
  (net): Sequential(
    (0): Linear(in_features=4, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=128, bias=True)
    (3): ReLU()
    (4): Linear(in_features=128, out_features=2, bias=True)
  )
  (soft_max): Softmax(dim=-1)
)

tensor_action只有一个维度，与参数dim=1不符

Answer 1 · 2023-04-28T03:00:15.000Z

这里是我在把 A2C 和 PPO 合并的时候出错了。这两个issue 应该是同一个问题：
#306

我会一起更新代码解决它们。谢谢你

Answer 2 · 2023-05-01T03:06:06.000Z

另外，在elegantrl/train
/run.py，Learner进程里面的这句：
actions = torch.empty((horizon_len, num_seqs, action_dim), dtype=torch.float32, device=agent.device)
是否应该改为：
actions = torch.empty((horizon_len, num_seqs, 1 if if_discrete else action_dim), dtype=torch.float32, device=agent.device)
并在前面加上：
if_discrete = args.if_discrete