Implementation of World Models paper by David Ha, Jürgen Schmidhuber.
Applied to the gym CarRacing-v0 environment. Model-based reinforcement learning.
1 - Collect data. Let the agent act in the env without optimization. I added a random track tile start to enrich the data.
2 - Train VAE and save the encodings.
3 - Train Mixture-Density-Network-RNN (MDN-RNN) with the VAE encodings. Occasionally kill some gradients because of potential inf loss.
4 - Train the Controller network which has the input: MDN-RNN hidden state + VAE encoding. Use an evolutionary algorithm (CMA-ES) for optimization.
5 - (Try) dream environment where the MDN-RNN generates the VAE encodings and let the VAE decoder output the encoding. No inputs from the real environment.
Youtube
The major improvements in regards to model free RL methods (watch CarRacing environment on Youtube)
are that the Controller network gets an observation encoding and a time dependent representation from the RNN.
Because of that you are able to solve the task by just applying an evolutionary algorithm on top of that.
- (small) population size: 16, rollouts: 18
- working poorly, mdn-rnn only gives a good prediction when you insert a sequence of 3+ env steps. One step prediction doesnt seem to work properly ... Maybe overfitting, i don't know.