- Install mujoco
- Install mujoco-py
- Clone rstrudel/bcmuj and do
pip install -e .
- Clone rstrudel/bc and do
pip install -e .
python online_train.py [--method METHOD] [--resume RESUME]
METHOD
should bebc
,dagger
ordart
RESUME
can specify an epoch identifier to resume training from (saved models are stored in./storage/models/[METHOD]/
)
python eval.py [METHOD] [EPOCH] [--render] [--eps EPS] [--all ALL]
METHOD
should be%expert
,bc
,dagger
ordart
EPOCH
should be the identifier of the epoch to evaluate (irrelevant for%expert
)--render
can be specified to render the environmentEPS
can specify a number of episodes to run (default is1000
)ALL
can specify to evaluate all epochs at the specified interval untilEPOCH
(not compatible with--render
)
Our results can be found in this notebook.
To reproduce, run:
python online_train.py --method bc ;
python online_train.py --method dagger ;
python online_train.py --method dart ;
python eval.py bc 6144 --eps 500 --all 128 ;
python eval.py dagger 6144 --eps 500 --all 128 ;
python eval.py dart 6144 --eps 500 --all 128
Then just execute the notebook.
A GPU with a lot of memory is required to run this. It should take about 48 hours to train and evaluate.