voidful/TextRL

AssertionError

Closed this issue · 3 comments

具体错误信息

Traceback (most recent call last):
  File "/home/ll_coder/workspace/Aigc/RLHF.py", line 36, in <module>
    pfrl.experiments.train_agent_with_evaluation(
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/experiments/train_agent.py", line 208, in train_agent_with_evaluation
    eval_stats_history = train_agent(
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/experiments/train_agent.py", line 57, in train_agent
    action = agent.act(obs)
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/agent.py", line 161, in act
    return self.batch_act([obs])[0]
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/textrl/actor.py", line 163, in batch_act
    return self._batch_act_train(batch_obs)
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/agents/ppo.py", line 721, in _batch_act_train
    assert len(self.batch_last_action) == num_envs
AssertionError

我确信环境按照Readme安装,在跑example 1的时候总是报这个错误,请问有遇到过类似问题吗?

i have the same question

It is a issue related to the mismatch of distribution, i change it to categorial back.
Also, we should return reward on every sample on ranking stage.

All the issue should be fixed right now. I will try to add testing in the project.

(應該是distribution的shape不對導致的,我重新修改這部分的code,現在應該正常了。