Related to training

Question

Related to training

Closed this issue 4 years ago · 1 comments

Thank you the neuro-vectorizer code!

Would it be possible to tell me how much time it takes to complete training? There is only one comment in the README -

"Training might take a long time to finish.".

I understand training time is dependent on a number of factors, but if you could document the time it takes for training on some particular configuration, it would be very helpful.

EDIT - Training terminated after 200 epochs and took a little less than 2 hours, I had enabled GPU ("num_gpus": 1) and set "num_workers": 15

How often does the model get saved and in which path? I see the following configuration in autovec.py

checkpoint_freq = 1,

Does this mean the model is saved every 1 iteration? Each iteration takes about 30 seconds on my system and training has completed 150+ iterations. So does that mean I have 150 models saved somewhere? Please do let me know.

EDIT - Found all the 200 checkpoints under ~/ray_results/neurovectorizer_train/PPO_autovec_*
Maybe this could be documented for ease of use as well, if you think it is worth it.

Answer 1 · 2020-06-09T07:50:09.000Z

Hi @Kadakol, Thanks for the great feedback!
Indeed the parameters in autovec.py determine different things like the checkpointing frequency. We point the reader to https://ray.readthedocs.io to get more familiar with the ray/RL API.
It should not take too long. We don't give a time estimate because it depends on the user resources. Make sure that if the processor you use is not the same as ours, you might want to remove O3_runtimes.pkl file so it regenerates these runtimes.
Let me know if you have any other questions!