demo_A2C_PPO

Question

demo_A2C_PPO

Opened this issue 2 months ago · 0 comments

D:\anconda\envs\pytorch\python.exe C:\Users\user\Desktop\ElegantRL-master\examples\demo_A2C_PPO.py
env_args = {'env_name': 'CartPole-v1',
'num_envs': 1,
'max_step': 500,
'state_dim': 4,
'action_dim': 2,
'if_discrete': True}
| Arguments Remove cwd: ./CartPole-v1_DiscreteA2C_0
| Evaluator:
| step: Number of samples, or total training steps, or running times of env.step().
| time: Time spent from the start of training to this moment.
| avgR: Average value of cumulative rewards, which is the sum of rewards in an episode.
| stdR: Standard dev of cumulative rewards, which is the sum of rewards in an episode.
| avgS: Average of steps in an episode.
| objC: Objective of Critic network. Or call it loss function of critic network.
| objA: Objective of Actor network. It is the average Q value of the critic network.
################################################################################
ID Step Time | avgR stdR avgS stdS | expR objC objA etc.

tensor_action = tensor_action.argmax(dim=1)
IndexError：维度超出范围（预期在 [-1， 0] 范围内，但得到 1）