optuna/optuna-examples

NSGAIISampler behavior from its population size

willytell opened this issue · 2 comments

Trying to gain some insight of the NSGA-II sampler, but still I can't understand the impact of the population size and where is specified the number of generations. I've read the paper in which this multi-objective algorithm is based on following the Optuna's documentation.
In particular:

  1. By default the population_size=50, but if then I define n_trials=10, then this algorithm is like a random search because the objective function should be evaluated at least 50 times to make one generation?
  2. NSGA-II is supposed to generates a number of generations, but where is this number specified?

Any help is appreciated! Thanks in advance.

Thank you for your question.

By default the population_size=50, but if then I define n_trials=10, then this algorithm is like a random search

You're right. If all trials successfully finish, the relationship between the trials and generations is as follows:

Trials Generation Sampler
[0, 49] 0 NSGAIISampler calls RandomSampler internally.
[50, 99] 1 NSGAIISampler
[100, 149] 2 NSGAIISampler

where is this number specified?

Please use the n_trials argument of Study.optimize. If you want G generations with P individuals in population, then please set n_trials = P * G. Optuna chooses this design in order for users to parallelize the optimization. When users launch two optimization processes then they should set n_trials = P * G / 2.

Excellent explanation!! thank you!!

Just for the records, for whom could have the same or a similar question, I ran the next simple code:

import sklearn
import math
import optuna
import pandas as pd 

def objective(trial):
    
    x = trial.suggest_int("x", 2, 150)
    y = trial.suggest_int("y", 2, 100)
    
    score1 = x**2 - y**2
    score2 = math.sin(y)
    return score1, score2

sampler = optuna.samplers.NSGAIISampler(population_size=50)
study = optuna.create_study(study_name='studyTest', 
                            directions=["minimize", "minimize"],
                            sampler=sampler)
study.optimize(objective, n_trials=100)

df = study.trials_dataframe() 
df.to_csv("test.csv", sep=';', index=False)`

In "test.csv" there is the information that @toshihikoyanase put in the table above and it is possible to make the relationship between the number of generations and the number of trials, that he has explained. Thank you for the extra bonus, that is the mention about "two optimization processes".