AI4Finance-Foundation/ElegantRL

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Opened this issue · 0 comments

\demo_A2C_PPO.py
env_args = {'env_name': 'CartPole-v1',
'num_envs': 1,
'max_step': 500,
'state_dim': 4,
'action_dim': 2,
'if_discrete': True}
| Arguments Remove cwd: ./CartPole-v1_DiscreteA2C_0
| Evaluator:
| step: Number of samples, or total training steps, or running times of env.step().
| time: Time spent from the start of training to this moment.
| avgR: Average value of cumulative rewards, which is the sum of rewards in an episode.
| stdR: Standard dev of cumulative rewards, which is the sum of rewards in an episode.
| avgS: Average of steps in an episode.
| objC: Objective of Critic network. Or call it loss function of critic network.
| objA: Objective of Actor network. It is the average Q value of the critic network.
################################################################################
ID Step Time | avgR stdR avgS stdS | expR objC objA etc.
Traceback (most recent call last):

tensor_action = tensor_action.argmax(dim=1)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)