/Multitask-MAMujoCo

一个MAMujoCo的多任务版本,整合了所有的suite和任务;A multitask version of multiagent_mujoco, including all envs in ma-mujoco and all tasks in dmcontrol

Primary LanguagePython

Cross-domain Multi-task MAMujoco

This repository is an extension of MARL benchmark Multi-Agent Mujoco, and integrates the DMControl task settings into MAMujoco. The rewards of each domain and each task are normalized to [0, 1], which provides a benchmark for cross-domain multi-task multi-agent reinforcement learning.

这个repo是MARL 基准环境Multi-Agent Mujoco的拓展, 将DMControl的任务设定集成到了MAMujoco中, 将每个域和每个任务的奖励归一化到[0, 1], 为多智能体系统的跨域多任务强化学习提供benchmark.

Installation安装 & Start开始

The source code of MAMujoCo has been included in this repository, but you still need to install OpenAI gym and mujoco-py support.

这个仓库包含了MAMujoCo的源代码, 但是仍然需要安装OpenAI gym 和 mujoco_py的支持.

conda create -n cdmtmujoco python=3.11
conda activate cdmtmujoco
pip install gym, mujoco_py, omegaconf
python example.py

The multitasking environment is defined in the class multitask.MultiTaskMulti, you can modify the config.json to confirm which domains and tasks are included in the environment.

多任务的环境在类multitask.MultiTaskMulti中定义, 你可以修改config.json以确认环境包含哪些域和任务.

Task Setting 任务设定

Suite Domain Task Description
Ant 2x4
2x4d
run rewarded by running forward as fast
run-backwards rewarded by running backward as fast
cheetah 2x3
6x1
run -
run-backwards -
jump rewarded by jumping high and keeping move speed low
run-front-foot rewarded by running speed and the height of behind foot
run-back-foot rewarded by running speed and the height of front foot
hopper 3x1 hop -
hop-backwards -
stand rewarded by keeping move speed low and minimize control cost
flip rewarded by flipping angle momentum
flip-backwards -
humanoid 9|8 run -
stand -
walk rewarded by keeping move speed in a zone
humanoid_standup 9|8 standup rewarded by standing up from a lying position
reacher 2 reach rewarded by minimizing the distance between fingertip and target
swimmer 2x1 swim -
swim-backwards -
walker 2x3 run -
run-backwards -
stand -
walk -
walk-backwards -

Reward Normalization 奖励归一化

The reward normalization method in DMControl is applied to this repository, and dm_control/utils/rewards.py is integrated into the custom_suites/utils.py. The original task in MAMujoCo does not change the reward design, but normalizes the reward to [0, 1]. For new tasks, e.g. cheetah-jump reward settings refer to _jump_reward() in custom_suites/cheetah.py.

DMControl中奖励归一化的方法被应用到了这个仓库中, dm_control/utils/rewards.py被整合到custom_suites/utils.py中. MAMujoCo中原有的任务不改变奖励设计方式, 但是将奖励归一化到[0, 1]之间. 对于新增的任务, 例如cheetah-jump 奖励设置参考 custom_suites/cheetah.py中的_jump_reward().

Cross-Domain Multi-Task Implement 跨域多任务实现

multitask.MultiTaskMulti中定义了一个列表self.envs, 用于存储所有的属于不同域的环境, 通过self.reset_task(task_idx)切换当前任务. 不同域的观测和状态维度不同, 所以在_obs_pat(obs), _state_pat(state)中会被填充到相同的维度, 以保证环境传递给算法的数据维度一致. 同样, 动作空间的维度和智能体的数量也与最大的任务保持一致, 与具体环境交互时, _act_crop(actions)会根据需要对动作进行裁剪, 以适应当前任务的要求.

The class multitask.MultiTaskMulti defines a list self.envs, Used to store all environments belonging to different domains, Switch the current task through reset_task(task_idx). The observation and state dimensions of different domains are different, so in _obs_pat(obs), _state_pat(state) will be filled to the same dimension, so that the data dimensions passed by the environment to the algorithm are consistent. Likewise, the dimensions of the action space and the number of agents are consistent with the largest tasks, when interacting with a specific environment, _act_crop(actions) will crop the actions as needed, to adapt to the requirements of the current task.