example/demo_A2C_PPO.py中离散的例子报异常
Opened this issue · 2 comments
churchillyik commented
执行
python demo_A2C_PPO.py --gpu=0 --drl=0 --env=6
出现异常
File "elegantrl/train/evaluator.py", line 176, in get_cumulative_rewards_and_steps
tensor_action = tensor_action.argmax(dim=1)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
在这句下断点,并打印变量如下:
(Pdb) l
171 returns = 0.0 # sum of rewards in an episode
172 for steps in range(max_step):
173 tensor_state = torch.as_tensor(state, dtype=torch.float32, device=device).unsqueeze(0)
174 tensor_action = actor(tensor_state)
175 if if_discrete:
176 B-> tensor_action = tensor_action.argmax(dim=1)
177 action = tensor_action.detach().cpu().numpy()[0] # not need detach(), because using torch.no_grad() outside
178 state, reward, done, _ = env.step(action)
179 returns += reward
180
181 if if_render:
(Pdb) pp tensor_state
tensor([[ 0.0357, -0.0466, 0.0230, -0.0324]], device='cuda:0')
(Pdb) pp tensor_action
tensor([0], device='cuda:0')
(Pdb) pp actor
ActorDiscretePPO(
(net): Sequential(
(0): Linear(in_features=4, out_features=256, bias=True)
(1): ReLU()
(2): Linear(in_features=256, out_features=128, bias=True)
(3): ReLU()
(4): Linear(in_features=128, out_features=2, bias=True)
)
(soft_max): Softmax(dim=-1)
)
tensor_action只有一个维度,与参数dim=1不符
churchillyik commented
另外,在elegantrl/train
/run.py,Learner进程里面的这句:
actions = torch.empty((horizon_len, num_seqs, action_dim), dtype=torch.float32, device=agent.device)
是否应该改为:
actions = torch.empty((horizon_len, num_seqs, 1 if if_discrete else action_dim), dtype=torch.float32, device=agent.device)
并在前面加上:
if_discrete = args.if_discrete