Does Tianshou truly supports MARL out of the box?

Question

Does Tianshou truly supports MARL out of the box?

Opened this issue a month ago · 1 comments

I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- design request (i.e. "X should be changed to Y.")
I have visited the source website
I have searched through the issue tracker for duplicates

I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, gymnasium as gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

I'm trying to use PettingZoo with Tianshou, but the documentation doesn't explain much for training multiple agents at the same time. My agents have to act cooperatively and perform the same tasks, so they have the same policy. It looks simple - just replace the random policy in the example with the right one, and it even seems to work, but I'm not sure it works correctly.

For some reason, only the first agent's rewards come into the reward_metric function. For the second agent it is always 0. Within the PettingZoo AEC environment, the rewards are successfully accumulated.

def step(self, action):
        ... #the very end of my step function
        if self._agent_selector.is_last():
            self._accumulate_rewards() #global rewards are definitely updated
            self._clear_rewards()

        self.agent_selection = self._agent_selector.next()

def reward_metric(rews):
    return rews[:, 0] #shape of rews is correct, but rews[:, 1] is always 0

Am I initializing the policy correctly (with a shared network object, otherwise the second agent is not trained (although maybe this follows from the first point))?

def _get_agents(...):
    ...
    if agent1 is None:
        net = Net(
            state_shape=observation_space.shape or observation_space.n,
            action_shape=env.action_space.shape or env.action_space.n,
            hidden_sizes=args.hidden_sizes,
            softmax=True,
            num_atoms=51,
            dueling_param=({
                "linear_layer": noisy_linear
            }, {
                "linear_layer": noisy_linear
            }),
            device=args.device,
        ).to(args.device)
        if optim is None:
            optim = torch.optim.Adam(net.parameters(), lr=args.lr)
        agent1 = RainbowPolicy(
            model=net,
            optim=optim,
            action_space=env.action_space,
            discount_factor=args.gamma,
            estimation_step=args.n_step,
            target_update_freq=args.target_update_freq,
        ).to(args.device)

        if (args.watch):
            agent1.load_state_dict(torch.load('./log/ttt/dqn/policy_0.pth'))

    if agent2 is None:
    
        agent2 = RainbowPolicy(
            model=net,
            optim=optim,
            action_space=env.action_space,
            discount_factor=args.gamma,
            estimation_step=args.n_step,
            target_update_freq=args.target_update_freq,
        ).to(args.device)

        if (args.watch):
            agent2.load_state_dict(torch.load('./log/ttt/dqn/policy_1.pth'))

Tianshou: 1.0.0
PettingZoo: 1.24.3

Thank you for your time!

Answer 1 · 2024-05-05T21:12:42.000Z

Hi @Legendorik

The core team that has been working on tianshou for the last 6 months has deprioritized marl - there's just too many other things that need fixing first. Also, it's more of a niche topic.

I'm afraid I'm the foreseeable future we won't be able to help you with that, but I'm happy to review a PR if you decide to dive deeper into marl with tianshou.

Maybe @Trinkle23897 or @ChenDRAG can help though