Controller training random stop
wensdong opened this issue · 4 comments
Please find the following errors:
errorController.txt
errorController1.txt
The training result also looks bad:
Loading VAE at epoch 33 with test loss 34.16893509338379
Loading MDRNN at epoch 29 with test loss 1.053613607351445
Loading Controller with reward 316.7092720531355
The VAE and MDRNN achieve decent results. The two errors you obtain are CMAES related. I guess this could be related to a n-samples set too low. In our replication, we used n-samples=16 and pop-size=4. If you could try with these values and keep me updated. I will investigate too.
Thanks so much! this fix worked! However, the training is going nowhere after 200 generations:
"python traincontroller.py --logdir exp_dir --n-samples 16
--pop-size 4 --target-return 950 --display --max-workers 12"
Loading VAE at epoch 33 with test loss 34.16893509338379
Loading MDRNN at epoch 29 with test loss 1.053613607351445
Loading Controller with reward 352.99061245415896
trainControllerGoingNowhereAfter200gen2.txt
Changed pop-size ==16 and
Current evaluation: -619.9075467084033 after 100 gens
it seems working