x35f/model_based_rl

Retraining the model

BartvLaatum opened this issue · 2 comments

Hi,

I have a question regarding retraining the model after collecting new experiences in the real environment.

  • Do you reinitialize the model parameters, i.e., weights, biases etc., before retraining the model?

Because when I don't do this and continue training the model with additional data, the model is highly biased to the first batches of collected experience. Since it has trained much more often on these earlier collected experiences than on the more recent ones.

Best, Bart

x35f commented

Hi Bart,

In the USB MBPO re-implementation, the model parameters aren't reinitialized. And I can't find related codes in the original implementation either. Reinitializing the model for each model training step would take much more time till convergence. Are there any existing experimental results indicating the disadvantage of such a training design?

This is indeed an interesting question when the model architecture is insufficient to capture the underlying transition dynamics. From a universal perspective of the model, the earlier batches and more recent ones should follow the same underlying dynamics. The previously trained model provides a good initialization for training (any better idea here?). Since the supervised training step tolerates five epochs before an actual improvement on all data, I think biasing towards the earlier batches at first does not necessarily mean the model would perform worse on the recent batches.

Anyway, I have started several test runs and will update the results ASAP. Thank you for raising this interesting question.

Best, Feng

Hi Feng,

Thank you for your elaborate answer! Yes, when the earlier batches and more recent ones follow the same underlying dynamics, the previously trained model provides a good initialization for training. However, when environments are more complex, I think this can become a more severe problem.

This model-based RL implementation solely trains on recently collected data or combines previously and newly collected data. In my experience, when retraining NN models in more complex environments with nonlinear dynamics, I saw a decrease in performance when compared to training the model once on all data.

Curious about your test results!

Best, Bart