Code for Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy.
Install the required python packages in requirement.txt
by
pip install -r ./requirement.txt
Note: You can follow the instructions at here to properly install mujoco-py
.
we have built a docker image, with which we ran all the experiments in the paper. The docker image can be pulled from DockerHub.
docker pull sanluosizhou/selfdl:ml
You can conduct the experiment in HalfCheetah-v2
with the following command.
python main.py --env_name HalfCheetah-v2 --rnn_fix_length 16 --seed 5 --task_num 40 --max_iter_num 2000 --varying_params dof_damping_1_dim --test_task_num 40 --ep_dim 2 --name_suffix RMDM --rbf_radius 3000 --use_rmdm --stop_pg_for_ep --bottle_neck
We also provide the command for running in the docker
docker run --rm -it --shm-size 50gb --gpus all -v $PWD:/root/policy_adaptation sanluosizhou/selfdl:ml -c "cd /root/policy_adaptation && python main.py --env_name HalfCheetah-v2 --rnn_fix_length 16 --seed 5 --task_num 40 --max_iter_num 2000 --varying_params dof_damping_1_dim --test_task_num 40 --ep_dim 2 --name_suffix RMDM --rbf_radius 3000 --use_rmdm --stop_pg_for_ep --bottle_neck"
There are several key parameters:
--env_name
: configures the environment you are going to conduct experiment on. The possible environments:GridWorldPlat-v2,Hopper-v2,HalfCheetah-v2,Walker2d-v,Ant-v2,Humanoid-v2
.--rnn_fix_length
: configures the memory length (H in the paper).--seed
: configures the random seeds.--task_num
: configures how many environments are used for policy training (it should be set to 12 inGridWorldPlat-v2
).--test_task_num
: configures how many environments are used for policy testing (it should be set to12
inGridWorldPlat-v2
).--varying_params
: configures what kinds of environment changes are used, refer to code for all kinds of supported environment changes.
You can conduct the experiment in HalfCheetah-v2
with both gravity
and dof_damping
changed.
python main.py --env_name HalfCheetah-v2 --rnn_fix_length 16 --seed 5 --task_num 40 --max_iter_num 2000 --varying_params dof_damping_1_dim gravity --test_task_num 40 --ep_dim 2 --name_suffix RMDM_more_change --kernel_type rbf --rbf_radius 80 --diversity_loss_weight 1.0 --use_rmdm --stop_pg_for_ep --bottle_neck