Controller training random stop

Question

Controller training random stop

wensdong opened this issue 6 years ago · 4 comments

Please find the following errors:

errorController.txt
errorController1.txt

The training result also looks bad:
Loading VAE at epoch 33 with test loss 34.16893509338379
Loading MDRNN at epoch 29 with test loss 1.053613607351445
Loading Controller with reward 316.7092720531355

Answer 1 · 2018-07-31T15:06:26.000Z

The VAE and MDRNN achieve decent results. The two errors you obtain are CMAES related. I guess this could be related to a n-samples set too low. In our replication, we used n-samples=16 and pop-size=4. If you could try with these values and keep me updated. I will investigate too.

Answer 2 · 2018-08-01T05:36:07.000Z

Thanks so much! this fix worked! However, the training is going nowhere after 200 generations:
"python traincontroller.py --logdir exp_dir --n-samples 16
--pop-size 4 --target-return 950 --display --max-workers 12"

Loading VAE at epoch 33 with test loss 34.16893509338379
Loading MDRNN at epoch 29 with test loss 1.053613607351445
Loading Controller with reward 352.99061245415896
trainControllerGoingNowhereAfter200gen2.txt

Answer 3 · 2018-08-02T05:27:21.000Z

Changed pop-size ==16 and
Current evaluation: -619.9075467084033 after 100 gens

Answer 4 · 2018-08-02T16:59:42.000Z

it seems working