ctallec/world-models

[Question] Some controller training questions

Closed this issue · 2 comments

Hi,

I have some doubts regarding the controller training:

  • Which is the meaning of the screen outputs?

captura de pantalla 2019-02-07 a las 21 03 10

  • How much usually takes to be trained (e.g. README parameters, one worker/1060 Ti GTX)? Does it converge to a specific error value?

  • I am not really sure how much a difference increasing the population and n-samples makes in the training.

Thanks!

  • The screen output corresponds to various information regarding CMAES optimization procedure. You can find more detailed information in the documentation of the CMA package: https://github.com/CMA-ES/pycma.

  • With a single worker and a single GPU, it's going to take a really long time to train. For reference, our own experiments took about a day, with about 30 workers and 4 P100 gpus.

  • CMAES is a black box optimization algorithm who is normally best suited to optimize deterministic functions. In our case, the return obtained by the agent is stochastic (notably due to the randomized road tracks used during training). To decrease the amount of stochasticity of the optimized function, the return obtained is averaged on n-samples rollouts. Besides, CMA is a population based black box optimization method: at each optimization step, CMA maintains a population of candidate solutions to the optimization problem, and these candidates evolve during optimization. The population size thus refers to the number of candidates that CMA maintains during optimization.

Alright, now is much more clear!