A bug when choosing actions

Question

A bug when choosing actions

Closed this issue 2 years ago · 1 comments

In line 63 of "rollout.py", relative code is

if self.args.alg == 'maven':
    action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
                                                       avail_action, epsilon, maven_z, evaluate)
else:
    action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
                                                       avail_action, epsilon, evaluate)

In else branch, it pass evaluate as the parameter maven_z for function choose_action. The correct code is

if self.args.alg == 'maven':
    action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
                                                       avail_action, epsilon, maven_z, evaluate)
else:
    action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
                                                       avail_action, epsilon, evaluate=evaluate)

After correcting it, the evaluation of policy gradient tends to be extremely unstable. In line 109 of "agent.py",

if epsilon == 0 and evaluate:
    action = torch.argmax(prob)
else:
    action = Categorical(prob).sample().long()

I think it a mistake for taking argmax of prob when doing evaluation, because policy gradient is learning the probability of the policy $\pi$. We should also sample it, just use the code below

action = Categorical(prob).sample().long()

I have tried and it truly works!

Answer 1 · 2022-09-08T08:57:30.000Z

Great advice!
I have deleted evaluated in choose_action()，if evaluated = True in RollouterWorker，then it will set epsilon = 0.