python train.py <environment name> -e <number of episode for each population, reward will be normalized across this batch> -n <number of process> -t <number of worker per each process> -o {cma/openes}
Therefore, The population will become n*t. Currently, -e 4 -n 4 -t 4 -o cma
works fine mostly with gt_rr
envs.
Possible Env Names:
{hopper, ant, walker2d, swimmer, cheetah, ant, humanoid, reacher, striker, pusher, thrower}
{(above list})_gt_rr
; ground truth relative reward.{(above list})_gt_binary_rr
; binarized ground truth relative reward{(above list})_siamese_rr
; reward as a predicted prob (by siamese network) whether the current traj is better than before.{(above list})_siamese_binary_rr
; binarized version of the siamese_rr
python model.py <environment name> <model file name>
To record video, check record_video
flag in model.py
.
- Hopper-v2
_gt_rr
env was trainable with the basic parameters @ 2500 generation (took more than 200 minutes)python train.py hopper_gt-rr -e 4 -n 4 -t 4
- Walker2d-v2
- on training...