HumanCompatibleAI/human_aware_rl

How can shared_policy parameter be used to train two agents with different policies?

ganeshkumarashok opened this issue ยท 6 comments

Is there functionality in the code to train two agents with distinct policies? I see the shared_policy parameter (set to True) in ppo_rllib_client.py but I am not sure how it can be used for this purpose.

Referenced section:

We're trying to have each AI agent learn its own policy and it'll be great to be able to do that with the ppo_rllib_client code.

Thanks!

Firstly, sorry for the late reply!

I am not completely sure, but it seems like shared_policy is not referenced anywhere else in the repo, so might just be some old line of code that hasn't been cleaned up.

There is no out-of-the-box functionality to train two agents with distinct policies, but I believe this should be a very easy change (a couple of lines of code at most), as rllib makes it easy to do these kinds of things. Maybe @nathan-miller23 or @mesutyang97 will be able to give you an initial direction to follow.

I have been looking into this. Thinking about ways to add this feature without breaking backward compatibility. Currently we only use ppo for tagging PPO agents, if we were to split it to ppo_0 and ppo_1 we risk losing backward compatibility.

But I think it could be done somehow. Looking into this more this weekend.

(Just to clarify, I think what Mesut is implying is that policies are automatically shared if the policy name used for rllib is the same. Currently in self play training, both agents are called ppo, so they automatically share the policy. However if they were to be called ppo_0 and ppo_1 the policies would be trained independently. If you are not interested in backwards compatibility, I'm sure this would not be a hard change to implement yourself in your own fork or so. Mesut correct me if I'm wrong ๐Ÿ™‚ )

yep, that was what I am referring to!

Thanks a lot!

Resolved