Specify max_num_trials for PBT or Successive Halving algorithm
martsalz opened this issue · 5 comments
Why is it not possible to specify max_num_trials
for the PBT algorithm or the Successive Halving algorithm? When will these algorithms be completed for an experiment?
https://parameter-sherpa.readthedocs.io/en/latest/algorithms/algorithms.html
Thanks.
PopulationBasedTraining
has the population_size
argument. Since PBT only trains one population the notion of max_num_trials
doesn't really exist there. One could call population_size
max_num_trials
instead but I think that could be confusing.
The asynchronous successive halving algorithm also doesn't really have a notion of maximum number of trials. It does however have a max_finished_configs
argument. This corresponds to putting a limit on how many trials to finish. This could be renamed max_num_trials
. I am not sure though if that would make it clearer or less clear, since this would only refer to the finished trials and not to the many unfinished ones that the algorithm explores along the way.
Both algorithms are ready to use. I just haven't run and reproduced those plots in the documentation yet.
PopulationBasedTraining
has thepopulation_size
argument. Since PBT only trains one population the notion ofmax_num_trials
doesn't really exist there. One could callpopulation_size
max_num_trials
instead but I think that could be confusing.
What do you mean by "Since PBT only trains one population"?
If I use the PBT as shown below, I can see from the table which trials were performed in which generation and on which trial the trial X is based. How many generations are carried out in total or how often is this process repeated?
In my case I have specified population_size=10
and in the experiment > 10 trials are performed.
Thanks.
Hey Martin,
PBT initializes a whole population of e.g. size 20 and trains each population member for say 1 epoch. Let's call this the first generation. The top 80% of this first generation simply move on to the second generation. The bottom 20% are discarded and replaced by sampling members from the top 20% and perturbing their parameters. Then this second generation is trained for one epoch. Then the process repeats onto the third generation and so forth.
So the population itself doesn't actually grow. But it does evolve through the resampling. Furthermore, each population member is trained further and further (in terms of epochs).
Now for Sherpa there may be a little bit of confusion in terms of the naming. For the sake of being able to parallelize, Sherpa here considers one trial as one "job". So Sherpa-PBT initializes the population as 20 trials with randomly sampled hyperparameters and leaves it to the user to decide in their script for how long to train each (say one epoch). After those 20 one-epoch-trials have finished it will schedule the top 80% out of those as new trials but indicating via the load_from
field to load the weights from a previously finished trial. This corresponds to "continuing" the best 80%. Each of those new trials will have new trial IDs because those have to be unique. You can however identify them by the fact that their generation
field will be 2 and their load_from
field will indicate what the "parent" of this trial is.
For the bottom 20% their load_from
fields will correspond to trials from the top 20% of the previous generation and their trial.parameters will have those parameters perturbed. So the user has to incorporate those perturbed parameters. For some Keras parameters this can be a bit tricky and I think you had actually found a bug for that in another issue.
Let me know if this clarifies things at all. Will review the other issues now.
Best,
Lars