AssertionError
Closed this issue · 3 comments
Ulov888 commented
具体错误信息
Traceback (most recent call last):
File "/home/ll_coder/workspace/Aigc/RLHF.py", line 36, in <module>
pfrl.experiments.train_agent_with_evaluation(
File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/experiments/train_agent.py", line 208, in train_agent_with_evaluation
eval_stats_history = train_agent(
File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/experiments/train_agent.py", line 57, in train_agent
action = agent.act(obs)
File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/agent.py", line 161, in act
return self.batch_act([obs])[0]
File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/textrl/actor.py", line 163, in batch_act
return self._batch_act_train(batch_obs)
File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/agents/ppo.py", line 721, in _batch_act_train
assert len(self.batch_last_action) == num_envs
AssertionError
我确信环境按照Readme安装,在跑example 1的时候总是报这个错误,请问有遇到过类似问题吗?
xzdong-2019 commented
i have the same question
voidful commented
It is a issue related to the mismatch of distribution, i change it to categorial back.
Also, we should return reward on every sample on ranking stage.
All the issue should be fixed right now. I will try to add testing in the project.
(應該是distribution的shape不對導致的,我重新修改這部分的code,現在應該正常了。