NSGAIISampler behavior from its population size
willytell opened this issue · 2 comments
Trying to gain some insight of the NSGA-II sampler, but still I can't understand the impact of the population size and where is specified the number of generations. I've read the paper in which this multi-objective algorithm is based on following the Optuna's documentation.
In particular:
- By default the population_size=50, but if then I define n_trials=10, then this algorithm is like a random search because the objective function should be evaluated at least 50 times to make one generation?
- NSGA-II is supposed to generates a number of generations, but where is this number specified?
Any help is appreciated! Thanks in advance.
Thank you for your question.
By default the population_size=50, but if then I define n_trials=10, then this algorithm is like a random search
You're right. If all trials successfully finish, the relationship between the trials and generations is as follows:
Trials | Generation | Sampler |
---|---|---|
[0, 49] | 0 | NSGAIISampler calls RandomSampler internally. |
[50, 99] | 1 | NSGAIISampler |
[100, 149] | 2 | NSGAIISampler |
where is this number specified?
Please use the n_trials
argument of Study.optimize
. If you want G
generations with P
individuals in population, then please set n_trials = P * G
. Optuna chooses this design in order for users to parallelize the optimization. When users launch two optimization processes then they should set n_trials = P * G / 2
.
Excellent explanation!! thank you!!
Just for the records, for whom could have the same or a similar question, I ran the next simple code:
import sklearn
import math
import optuna
import pandas as pd
def objective(trial):
x = trial.suggest_int("x", 2, 150)
y = trial.suggest_int("y", 2, 100)
score1 = x**2 - y**2
score2 = math.sin(y)
return score1, score2
sampler = optuna.samplers.NSGAIISampler(population_size=50)
study = optuna.create_study(study_name='studyTest',
directions=["minimize", "minimize"],
sampler=sampler)
study.optimize(objective, n_trials=100)
df = study.trials_dataframe()
df.to_csv("test.csv", sep=';', index=False)`
In "test.csv" there is the information that @toshihikoyanase put in the table above and it is possible to make the relationship between the number of generations and the number of trials, that he has explained. Thank you for the extra bonus, that is the mention about "two optimization processes".