StepNeverStop/RLs

Applied to multiple agents?

Closed this issue · 1 comments

Hi, I have tested your project for the multi-agent environment.
But I am not sure whether it is suitable for the multi-agent environment.

My environment includes 8 agents with discrete actions. However, I got this error:

initialize model SUCCUESS.
save config to /RLData/sac_no_v/test/GridWorldLearning/config
There was a mismatch between the provided action and the environment's expectation: The brain GridWorldLearning expected 8 discrete action(s), but was provided: [-0.050590384751558304, -0.665206789970398, -0.0410725474357605, -0.23551416397094727, 0.010302126407623291, 0.2644920349121094, -1.0, -0.10047897696495056, -1.0, 0.03841760754585266, -1.0, -1.0, -0.33658552169799805, 0.7163478136062622, -0.1180223822593689, 0.31758153438568115, -1.0, -0.18739420175552368, -0.15177105367183685, -0.2588164806365967, 0.11979779601097107, -0.5222678184509277, -0.6121081113815308, -1.0, -0.08478996157646179, -0.6589073538780212, -1.0, 0.32313454151153564, -0.3325958251953125, -0.9373922348022461, 0.4225391149520874, -0.18213623762130737, 0.7108762264251709, 0.1738891303539276, -0.6963950395584106, 0.41238147020339966, -1.0, 0.451471209526062, -0.6678181886672974, -0.8575950860977173]
unsupported operand type(s) for +: 'NoneType' and 'str'

Could you please help with it? Thanks!

支持多智能体、多脑、多图像输入训练,但是SAC算法不支持离散动作,框架主要针对连续控制问题,框架内支持离散问题的算法为pg,ac,a2c,dqn,前沿算法有些不支持离散控制问题,有些目前没有扩展。建议对连续控制问题进行测试,因为框架内为简单版dqn,收敛性不佳。
你的环境被算法以为是连续控制,最终输出维度为离散动作维度内积的动作向量。