Start where we left: save and load experiments

Question

Start where we left: save and load experiments

jgrizou opened this issue 9 years ago · 5 comments

Hi guys,

If I want to run an experiment over a few days, is there a way to stop, save, reload, and start again an experiment?

I found fast_forward(self, log) in experiment.py, it resets an experiment state given a log. Could you confirm this is all we need and thus the way to go?

If yes, I will implement a save and load function in the log class. What would be the best way in your opinion (pickle.. ) ?

Answer 1 · 2016-02-18T17:41:06.000Z

Hi John,

Yes, fast_forward(self, log) should do the job. However note that it simply re-updates the SM and IM models in a loop. So:

If there is a random process in one of the update(.) functions, it can result in a slightly different model. However I don't think it is the case in any of the available models.
The Experiment instance has to be the same as before saving. So be careful if you modify by hand one of the involved objects (e.g. agent or whatever).

For saving and loading: ExperimentLog mainly contains standard python dictionaries, so a .json is maybe easier and more standard than a pickle.

There is a notebook about it (and I see that it actually uses pickle, so do it as you prefer;)

Ask again if you experience problems with pickeling, I remember we had an issue about it, that we solved, but I don't remember what it was (maybe due to parallel processing in ExperimentPool).

Thanks a lot for the contributions!

Answer 2 · 2016-02-19T18:01:25.000Z

Thanks for the quick reply! I missed that notebook, my bad.

I implemented the simple pickle way and updated the notebook. It is added to the waiting PR.

It will be more difficult to handle it in json fromat, because the _logs contains some numpy array.

'''fast_forward()''' seems to work fine to reload an xp, and even to xp.run() without crashing!
However I am not sure what happens in the log in such case, especially for the evaluation.

I do not yet master all the log system you are using but do you foresee possible problems?

The main I see is that the run function will start again as iteration and thus it might mess things up. And evaluate_at can cause trouble.

Anyway I will have a deeper look when time permits.

Cheers!

Answer 3 · 2016-04-25T15:06:31.000Z

Hi @jgrizou ,

I come across this issue by unstacking some old emails. Has the PR been merged eventually? I don't see it in the list.

Thanks!
Clément

Answer 4 · 2016-04-25T15:25:12.000Z

Hi @clement-moulin-frier! Yes I merged it myself in the end, see: 9317966

It seems to have caused no practical problem :)

Answer 5 · 2016-04-25T16:53:37.000Z

Cool, well done :)