sherpa-ai/sherpa

Specify max_num_trials for PBT or Successive Halving algorithm

martsalz opened this issue · 5 comments

Why is it not possible to specify max_num_trials for the PBT algorithm or the Successive Halving algorithm? When will these algorithms be completed for an experiment?

https://parameter-sherpa.readthedocs.io/en/latest/algorithms/algorithms.html

Thanks.

PopulationBasedTraining has the population_size argument. Since PBT only trains one population the notion of max_num_trials doesn't really exist there. One could call population_size max_num_trials instead but I think that could be confusing.

The asynchronous successive halving algorithm also doesn't really have a notion of maximum number of trials. It does however have a max_finished_configs argument. This corresponds to putting a limit on how many trials to finish. This could be renamed max_num_trials. I am not sure though if that would make it clearer or less clear, since this would only refer to the finished trials and not to the many unfinished ones that the algorithm explores along the way.

Both algorithms are ready to use. I just haven't run and reproduced those plots in the documentation yet.

PopulationBasedTraining has the population_size argument. Since PBT only trains one population the notion of max_num_trials doesn't really exist there. One could call population_size max_num_trials instead but I think that could be confusing.

What do you mean by "Since PBT only trains one population"?

If I use the PBT as shown below, I can see from the table which trials were performed in which generation and on which trial the trial X is based. How many generations are carried out in total or how often is this process repeated?

In my case I have specified population_size=10 and in the experiment > 10 trials are performed.

image

Thanks.

Hey Martin,
PBT initializes a whole population of e.g. size 20 and trains each population member for say 1 epoch. Let's call this the first generation. The top 80% of this first generation simply move on to the second generation. The bottom 20% are discarded and replaced by sampling members from the top 20% and perturbing their parameters. Then this second generation is trained for one epoch. Then the process repeats onto the third generation and so forth.

So the population itself doesn't actually grow. But it does evolve through the resampling. Furthermore, each population member is trained further and further (in terms of epochs).

Now for Sherpa there may be a little bit of confusion in terms of the naming. For the sake of being able to parallelize, Sherpa here considers one trial as one "job". So Sherpa-PBT initializes the population as 20 trials with randomly sampled hyperparameters and leaves it to the user to decide in their script for how long to train each (say one epoch). After those 20 one-epoch-trials have finished it will schedule the top 80% out of those as new trials but indicating via the load_from field to load the weights from a previously finished trial. This corresponds to "continuing" the best 80%. Each of those new trials will have new trial IDs because those have to be unique. You can however identify them by the fact that their generation field will be 2 and their load_from field will indicate what the "parent" of this trial is.

For the bottom 20% their load_from fields will correspond to trials from the top 20% of the previous generation and their trial.parameters will have those parameters perturbed. So the user has to incorporate those perturbed parameters. For some Keras parameters this can be a bit tricky and I think you had actually found a bug for that in another issue.

Let me know if this clarifies things at all. Will review the other issues now.

Best,
Lars