Cross-domain Multi-task MAMujoco

This repository is an extension of MARL benchmark Multi-Agent Mujoco, and integrates the DMControl task settings into MAMujoco. The rewards of each domain and each task are normalized to [0, 1], which provides a benchmark for cross-domain multi-task multi-agent reinforcement learning.

这个repo是MARL 基准环境Multi-Agent Mujoco的拓展, 将DMControl的任务设定集成到了MAMujoco中, 将每个域和每个任务的奖励归一化到[0, 1], 为多智能体系统的跨域多任务强化学习提供benchmark.

Installation安装 & Start开始

The source code of MAMujoCo has been included in this repository, but you still need to install OpenAI gym and mujoco-py support.

这个仓库包含了MAMujoCo的源代码, 但是仍然需要安装OpenAI gym 和 mujoco_py的支持.

conda create -n cdmtmujoco python=3.11
conda activate cdmtmujoco
pip install gym, mujoco_py, omegaconf
python example.py

The multitasking environment is defined in the class multitask.MultiTaskMulti, you can modify the config.json to confirm which domains and tasks are included in the environment.

多任务的环境在类multitask.MultiTaskMulti中定义, 你可以修改config.json以确认环境包含哪些域和任务.

Task Setting 任务设定

Suite	Domain	Task	Description
Ant	2x4 2x4d	run	rewarded by running forward as fast
Ant	2x4 2x4d	run-backwards	rewarded by running backward as fast
cheetah	2x3 6x1	run	-
		run-backwards	-
		jump	rewarded by jumping high and keeping move speed low
		run-front-foot	rewarded by running speed and the height of behind foot
		run-back-foot	rewarded by running speed and the height of front foot
hopper	3x1	hop	-
		hop-backwards	-
		stand	rewarded by keeping move speed low and minimize control cost
		flip	rewarded by flipping angle momentum
		flip-backwards	-
humanoid	9\|8	run	-
		stand	-
		walk	rewarded by keeping move speed in a zone
humanoid_standup	9\|8	standup	rewarded by standing up from a lying position
reacher	2	reach	rewarded by minimizing the distance between fingertip and target
swimmer	2x1	swim	-
swimmer	2x1	swim-backwards	-
walker	2x3	run	-
		run-backwards	-
		stand	-
		walk	-
		walk-backwards	-

Reward Normalization 奖励归一化

The reward normalization method in DMControl is applied to this repository, and dm_control/utils/rewards.py is integrated into the custom_suites/utils.py. The original task in MAMujoCo does not change the reward design, but normalizes the reward to [0, 1]. For new tasks, e.g. cheetah-jump reward settings refer to _jump_reward() in custom_suites/cheetah.py.

DMControl中奖励归一化的方法被应用到了这个仓库中, dm_control/utils/rewards.py被整合到custom_suites/utils.py中. MAMujoCo中原有的任务不改变奖励设计方式, 但是将奖励归一化到[0, 1]之间. 对于新增的任务, 例如cheetah-jump 奖励设置参考 custom_suites/cheetah.py中的_jump_reward().

Cross-Domain Multi-Task Implement 跨域多任务实现

类multitask.MultiTaskMulti中定义了一个列表self.envs, 用于存储所有的属于不同域的环境, 通过self.reset_task(task_idx)切换当前任务. 不同域的观测和状态维度不同, 所以在_obs_pat(obs), _state_pat(state)中会被填充到相同的维度, 以保证环境传递给算法的数据维度一致. 同样, 动作空间的维度和智能体的数量也与最大的任务保持一致, 与具体环境交互时, _act_crop(actions)会根据需要对动作进行裁剪, 以适应当前任务的要求.

The class multitask.MultiTaskMulti defines a list self.envs, Used to store all environments belonging to different domains, Switch the current task through reset_task(task_idx). The observation and state dimensions of different domains are different, so in _obs_pat(obs), _state_pat(state) will be filled to the same dimension, so that the data dimensions passed by the environment to the algorithm are consistent. Likewise, the dimensions of the action space and the number of agents are consistent with the largest tasks, when interacting with a specific environment, _act_crop(actions) will crop the actions as needed, to adapt to the requirements of the current task.

zhaozijie2022/Multitask-MAMujoCo

Cross-domain Multi-task MAMujoco

Installation安装 & Start开始

Task Setting 任务设定

Reward Normalization 奖励归一化

Cross-Domain Multi-Task Implement 跨域多任务实现