A bug when choosing actions
Closed this issue · 1 comments
Chty-syq commented
In line 63 of "rollout.py", relative code is
if self.args.alg == 'maven':
action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
avail_action, epsilon, maven_z, evaluate)
else:
action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
avail_action, epsilon, evaluate)
In else branch, it pass evaluate as the parameter maven_z for function choose_action. The correct code is
if self.args.alg == 'maven':
action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
avail_action, epsilon, maven_z, evaluate)
else:
action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
avail_action, epsilon, evaluate=evaluate)
After correcting it, the evaluation of policy gradient tends to be extremely unstable. In line 109 of "agent.py",
if epsilon == 0 and evaluate:
action = torch.argmax(prob)
else:
action = Categorical(prob).sample().long()
I think it a mistake for taking argmax of prob when doing evaluation, because policy gradient is learning the probability of the policy
action = Categorical(prob).sample().long()
I have tried and it truly works!
starry-sky6688 commented
Great advice!
I have deleted evaluated in choose_action(),if evaluated = True in RollouterWorker,then it will set epsilon = 0.