Meta Learning

Inital one time setup

Load the new software stack env2lmod

Load the software modules

module load gcc/6.3.0 python_gpu/3.7.4 tmux/2.6 eth_proxy
# is needed for mujoco
module load mesa-glu/9.0.0
module load glfw/3.3.4
# needed for dm-tree if python 3.7 (garage)
module load bazel

all in one line

module load gcc/6.3.0 python_gpu/3.7.4 mesa-glu/9.0.0 glfw/3.3.4 bazel/3.7.1 tmux/2.6 eth_proxy

Install the mujoco_py dependencies
```
sh mujoco.sh
source ~/.bashrc
```

Install the pyhton environment

python -m venv rl
source ./rl/bin/activate
pip install -r ./requirements.txt

Add .env file in root directory and paste following content:
```
OUT_DIR=/cluster/scratch/<username>
```

Every time setup

env2lmod
module load gcc/6.3.0 python_gpu/3.7.4 mesa-glu/9.0.0 glfw/3.3.4 bazel/3.7.1 tmux/2.6 eth_proxy
cd metalearning
source ./rl/bin/activate

Running a job

List of specific commands for the experiments. Commands are customized such that a job needs the right amout of resources (check with bbjobs), to get better priority; As well as fixing a gpu for reproducibility.

experiment	epoch time	cmd
maml_trpo_metaworld_ml1_basketball		`bsub -n 4 -J "maml-tpro" -W 300:00 -R "rusage[mem=4096]" 'python src/maml_trpo_metaworld_ml1_basketball.py'`
maml_trpo_metaworld_ml10	35min	`bsub -n 4 -J "maml-tpro" -W 300:00 -R "rusage[mem=4096]" 'python src/maml_trpo_metaworld_ml10.py'`
maml_trpo_metaworld_ml45	50min	`bsub -n 15 -J "maml-tpro" -W 24:00 -R "rusage[mem=4096]" 'python src/maml_trpo_metaworld_ml45.py'`
pearl_metaworld_ml1_basketball		`bsub -n 4 -J "pearl" -W 300:00 -R "rusage[mem=4096]" 'python src/pearl_metaworld_ml1_basketball.py'`
pearl_metaworld_ml10		`bsub -n 4 -J "pearl" -W 24:00 -R "rusage[mem=4096]" 'python src/pearl_metaworld_ml10.py'`
pearl_metaworld_ml10 gpu		`bsub -n 10 -J "pearl" -W 24:00 -R "rusage[mem=2048, ngpus_excl_p=1]" -R "select[gpu_model0==GeForceRTX1080Ti]" 'python src/pearl_metaworld_ml10.py' --use_gpu True`

cpu smaller job (10*3Gb)

bsub -n 10 -J "maml-tpro" -W 4:00 -R "rusage[mem=3072]" 'python src/maml_trpo_metaworld_ml10.py'

cpu larger job (20*4Gb)

bsub -n 20 -J "maml-tpro" -W 24:00 -R "rusage[mem=4096]" 'python src/maml_trpo_metaworld_ml10.py'

gpu smaller job (10*3Gb & any gpu)

bsub -n 10 -J "maml-tpro" -W 4:00 -R "rusage[mem=3072, ngpus_excl_p=1]" 'python src/maml_trpo_metaworld_ml10.py'

gpu larger job (20*4Gb & 2080Ti)

bsub -n 20 -J "maml-tpro" -W 24:00 -R "rusage[mem=4096, ngpus_excl_p=1]" -R "select[gpu_model0==GeForceRTX2080Ti]" 'python src/maml_trpo_metaworld_ml10.py'

Some useful cluster commands

jobs

bbjobs
bjobs -w
bjobs -l
bpeek -f

modules

module ls                      # list loaded modules
module spder python            # search for modules with name pyhton

Troubleshooting

In case of the error

...
File "mujoco_py/cymj.pyx", line 1, in init mujoco_py.cymj
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

reinstall mujocopy with the numpy version of your liking. For instance numpy 1.19.15 is compatible with tensorflow (see openai/mujoco-py#607).

pip cache remove mujoco_py
pip uninstall mujoco_py
# install numpy version you like to use before installing mujoco-py
pip install numpy==1.19.5 six~=1.15.0
pip install mujoco-py --no-cache-dir --no-binary :all: --no-build-isolation

FRL-Project/metalearning