This repository is an extension of MARL benchmark Multi-Agent Mujoco, and integrates the DMControl task settings into MAMujoco. The rewards of each domain and each task are normalized to [0, 1], which provides a benchmark for cross-domain multi-task multi-agent reinforcement learning.
这个repo是MARL 基准环境Multi-Agent Mujoco的拓展, 将DMControl的任务设定集成到了MAMujoco中, 将每个域和每个任务的奖励归一化到[0, 1], 为多智能体系统的跨域多任务强化学习提供benchmark.
The source code of MAMujoCo has been included in this repository, but you still need to install OpenAI gym and mujoco-py support.
这个仓库包含了MAMujoCo的源代码, 但是仍然需要安装OpenAI gym 和 mujoco_py的支持.
conda create -n cdmtmujoco python=3.11
conda activate cdmtmujoco
pip install gym, mujoco_py, omegaconf
python example.py
The multitasking environment is defined in the class multitask.MultiTaskMulti
,
you can modify the config.json
to confirm which domains and tasks are included in the environment.
多任务的环境在类multitask.MultiTaskMulti
中定义,
你可以修改config.json
以确认环境包含哪些域和任务.
Suite | Domain | Task | Description |
---|---|---|---|
Ant | 2x4 2x4d |
run | rewarded by running forward as fast |
run-backwards | rewarded by running backward as fast | ||
cheetah | 2x3 6x1 |
run | - |
run-backwards | - | ||
jump | rewarded by jumping high and keeping move speed low | ||
run-front-foot | rewarded by running speed and the height of behind foot | ||
run-back-foot | rewarded by running speed and the height of front foot | ||
hopper | 3x1 | hop | - |
hop-backwards | - | ||
stand | rewarded by keeping move speed low and minimize control cost | ||
flip | rewarded by flipping angle momentum | ||
flip-backwards | - | ||
humanoid | 9|8 | run | - |
stand | - | ||
walk | rewarded by keeping move speed in a zone | ||
humanoid_standup | 9|8 | standup | rewarded by standing up from a lying position |
reacher | 2 | reach | rewarded by minimizing the distance between fingertip and target |
swimmer | 2x1 | swim | - |
swim-backwards | - | ||
walker | 2x3 | run | - |
run-backwards | - | ||
stand | - | ||
walk | - | ||
walk-backwards | - |
The reward normalization method in DMControl is applied to this repository,
and dm_control/utils/rewards.py
is integrated into the custom_suites/utils.py
.
The original task in MAMujoCo does not change the reward design,
but normalizes the reward to [0, 1].
For new tasks, e.g. cheetah-jump reward settings refer to _jump_reward()
in custom_suites/cheetah.py
.
DMControl中奖励归一化的方法被应用到了这个仓库中,
dm_control/utils/rewards.py
被整合到custom_suites/utils.py
中.
MAMujoCo中原有的任务不改变奖励设计方式, 但是将奖励归一化到[0, 1]之间.
对于新增的任务,
例如cheetah-jump 奖励设置参考 custom_suites/cheetah.py
中的_jump_reward()
.
类multitask.MultiTaskMulti
中定义了一个列表self.envs
,
用于存储所有的属于不同域的环境,
通过self.reset_task(task_idx)
切换当前任务.
不同域的观测和状态维度不同,
所以在_obs_pat(obs)
, _state_pat(state)
中会被填充到相同的维度,
以保证环境传递给算法的数据维度一致.
同样, 动作空间的维度和智能体的数量也与最大的任务保持一致,
与具体环境交互时, _act_crop(actions)
会根据需要对动作进行裁剪,
以适应当前任务的要求.
The class multitask.MultiTaskMulti
defines a list self.envs
,
Used to store all environments belonging to different domains,
Switch the current task through reset_task(task_idx)
.
The observation and state dimensions of different domains are different,
so in _obs_pat(obs)
, _state_pat(state)
will be filled to the same dimension,
so that the data dimensions passed by the environment to the algorithm are consistent.
Likewise, the dimensions of the action space and the number of agents are consistent with the largest tasks,
when interacting with a specific environment,
_act_crop(actions)
will crop the actions as needed,
to adapt to the requirements of the current task.