[GENERAL SUPPORT]: Advice on using botorch models with fixed points and outcomes as initial points

Question

[GENERAL SUPPORT]: Advice on using botorch models with fixed points and outcomes as initial points

Closed this issue 3 days ago · 8 comments

Question

I have implemented a Bayesian Optimization through a botorch model with a sobol initialization amd it runs perfectly. After that, I thought the 40 points calculated by the process can be improve even further, so I took my metric and parameters logs and try to fix a new generation strategy taking those 40 points previously obtained by bayesian a sobol models as my initial sample and then run again the same generation step but only with the botorch step. After some research, I figured out how to give to the process the initial data, but now i have this error:
Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/src/optimization/bo_fine_tuning_given_initial_points.py", line 204, in <module> main() File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/src/optimization/bo_fine_tuning_given_initial_points.py", line 190, in main parameters, trial_index = ax_client.get_next_trial() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/utils/common/executils.py", line 167, in actual_wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/service/ax_client.py", line 543, in get_next_trial generator_run=self._gen_new_generator_run( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/service/ax_client.py", line 1781, in _gen_new_generator_run return not_none(self.generation_strategy).gen( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/generation_strategy.py", line 372, in gen return self._gen_multiple( ^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/generation_strategy.py", line 773, in _gen_multiple self._fit_current_model(data=data) File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/generation_strategy.py", line 852, in _fit_current_model self._curr.fit(experiment=self.experiment, data=data) File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/generation_node.py", line 237, in fit model_spec.fit( # Stores the fitted model as model_spec._fitted_modelFile "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/model_spec.py", line 147, in fit self._fitted_model = self.model_enum( ^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/registry.py", line 345, in __call__ model_bridge = bridge_class( ^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/torch.py", line 132, in __init__ super().__init__( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/base.py", line 213, in __init__ self._fit_if_implemented( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/base.py", line 236, in _fit_if_implemented self._fit( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/torch.py", line 653, in _fit self.model.fit( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/models/torch/botorch_modular/model.py", line 280, in fit surrogate.fit( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/models/torch/botorch_modular/surrogate.py", line 567, in fit model = self._construct_model( ^^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/models/torch/botorch_modular/surrogate.py", line 471, in _construct_model fit_botorch_model( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/models/torch/botorch_modular/utils.py", line 301, in fit_botorch_model fit_gpytorch_mll(mll) File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/botorch/fit.py", line 105, in fit_gpytorch_mll return FitGPyTorchMLL( ^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/botorch/utils/dispatcher.py", line 93, in __call__ return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/botorch/fit.py", line 259, in _fit_fallback raise ModelFittingError(msg) botorch.exceptions.errors.ModelFittingError: All attempts to fit the model have failed. For more information, try enabling botorch.settings.debug mode.

As it worked previously with sobol initialization, I suppose I'm supplying the initial data in an unexpected way, but I'm lost and I suppose is an small thing that I'm missing out. I would greatly appreciate some suggestions about how to make it work, and finally my script can be found bellow. Some objects are custom ones, but I'm sure the problem is in the way I manage the initial samples.

n_max_bo_samples = 40
metric_name = 'mAP_eval'


def main():
    timestamp = datetime.datetime.now().strftime('%d%m%Y_%H%M%S')
    dir_ax_generated_models = f'./data/ax/{timestamp}/'
    os.mkdir(dir_ax_generated_models)
    
    # Register coco instances to create datasets to draw images from coco format
    coco_dir = '/eos/project/c/cern-robotic-framework/Dataset/Objects/BLM/blm_coco/'
    register_coco_instances('blm_train', {},
                            coco_dir + 'annotations/train_augmented.json',
                            coco_dir + 'images')
    register_coco_instances('blm_val', {},
                            coco_dir + 'annotations/val.json',
                            coco_dir + 'images')
    register_coco_instances('blm_test', {},
                            coco_dir + 'annotations/test.json',
                            coco_dir + 'images')

    # Define the search space
    search_space = SearchSpace(
        parameters=[
            ChoiceParameter(
                name='roi_batch_size', parameter_type=ParameterType.INT, values=[256, 512, 1024]
            ),
            ChoiceParameter(
                name='backbone', parameter_type=ParameterType.STRING, values=['resnet50', 'resnet101']
            ),
            RangeParameter(
                name='lr', parameter_type=ParameterType.FLOAT, lower=0.0001, upper=0.01
            ),
            RangeParameter(
                name='momentum', parameter_type=ParameterType.FLOAT, lower=0.0, upper=0.9
            ),
            ChoiceParameter(
                name='nesterov', parameter_type=ParameterType.BOOL, values=[True, False]
            ),
            ChoiceParameter(
                name='use_on_runtime_augmentation', parameter_type=ParameterType.BOOL, values=[True, False]
            ),
            RangeParameter(
                name='lr_warmup_factor', parameter_type=ParameterType.FLOAT, lower=0.01, upper=0.1
            ),
            RangeParameter(
                name='lr_warmup_iters', parameter_type=ParameterType.INT, lower=0, upper=400
            ),
            RangeParameter(
                name='weight_decay', parameter_type=ParameterType.FLOAT, lower=0.0001, upper=0.025
            ),
            RangeParameter(
                name='batch_size', parameter_type=ParameterType.INT, lower=1, upper=5
            ),
            RangeParameter(
                name='freeze_at', parameter_type=ParameterType.INT, lower=2, upper=4
            ),
            FixedParameter(
                name='eval_patience', parameter_type=ParameterType.INT, value=2
            ),
            FixedParameter(
                name='epochs', parameter_type=ParameterType.INT, value=6000
            ),
            FixedParameter(
                name='eval_min_improvement', parameter_type=ParameterType.INT, value=1
            ),
            FixedParameter(
                name='eval_period', parameter_type=ParameterType.INT, value=100
            ),
            FixedParameter(
                name='output_dir', parameter_type=ParameterType.STRING, value=dir_ax_generated_models
            ),
        ]
    )

    
    # Initialize experiment
    experiment = Experiment(
        name='detectron2_optimization',
        search_space=search_space,
        optimization_config=OptimizationConfig(
            objective=Objective(
                metric=MAPMetric(name=metric_name),
                minimize=False,
            )
        ),
        runner=SyntheticRunner()
    )
    
    # Suplying initial fixed samples
    f = './data/ax/15112024_185726/experiment_logs.json'
    initial_fixed_trials = output_bo_to_completed_multitrials(f, metric_names=[metric_name])
    n_initial_samples = len(initial_fixed_trials)
    
    for i, initial_fixed_trial in enumerate(initial_fixed_trials):
        arm_name = f'{i}_0'
        inputs = initial_fixed_trial['input']
        inputs['output_dir'] = dir_ax_generated_models
        trial = experiment.new_trial()
        trial.add_arm( Arm(parameters=inputs, name=arm_name) )
        data = Data(df=pd.DataFrame.from_records([
            {
                'arm_name': arm_name,
                'metric_name': metric,
                'mean': output['mean'],
                'sem': 0, # output['sem'],
                'trial_index': i,
            }
            for metric, output in initial_fixed_trial['output'].items()])
        )
        experiment.attach_data(data)
        trial.run().complete()
    
    
    # Gaussian process generation strategy
    generation_strategy = GenerationStrategy(
        steps=[
            GenerationStep(  # BayesOpt step
                model=Models.BOTORCH_MODULAR,
                num_trials=n_max_bo_samples,
                model_kwargs={  # Kwargs to pass to BoTorchModel.__init__
                    'surrogate': Surrogate(CustomGP),
                    # I use qLogExpectedImprovement due to numerical problems in 
                    # the library implementation of qExpectedImprovement
                    'botorch_acqf_class': qLogExpectedImprovement, 
                    'fit_out_of_design': True
                },
            ),
        ]
    )
    
    # Create AxClient
    ax_client = AxClient(generation_strategy=generation_strategy)
    ax_client._experiment = experiment  # Attach the predefined experiment
    
    bo_options = {
        'n_initial_samples': n_initial_samples, 
        'initial_samples_strategy': 'fixed',
        'n_max_bo_samples': n_max_bo_samples,
        'kernel': 'rbf',
        'kernel_parametrs': {'lengthscale': None},
        # 'best parameters': best_parameters,
        # 'best values': best_values
        }
    
    outcomes_storager = SaveAxOutcomes(ax_client, bo_options, folder=dir_ax_generated_models)
    
    # Optimization loop
    for _ in range(n_max_bo_samples):
        parameters, trial_index = ax_client.get_next_trial()
        suffix_name = f"BO_trial_{trial_index + 1}"
        
        # Evaluate the model
        results = train_model(parameters, suffix_name=suffix_name)
        trial_mAP_eval = results[metric_name]
        
        # Complete the trial
        ax_client.complete_trial(trial_index=trial_index, raw_data={metric_name: trial_mAP_eval})
        outcomes_storager.save_last_ax_values(trial_mAP_eval)

Thank you.

Please provide any relevant code snippet if applicable.

# black box funtion (detectron2 model)
def train_model(parametrization, suffix_name=None):
    cfg = get_actual_cfg(**parametrization)
    
    trainer = CustomTrainer(cfg, suffix_name=suffix_name, mapper_on_runtime=custom_mapper)
    trainer.resume_or_load(resume=False)
    trainer.train()
    
    eval_metric = trainer.get_best_map_val()
    return {'mAP_eval': eval_metric}

class MAPMetric(Metric):
    def fetch_trial_data(self, trial: BaseTrial):
        # Extract parameters and trial number for suffix_name
        parametrization = trial.arm.parameters
        suffix_name = f'trial_{trial.index}'  # Use trial index for suffix name
        
        # Call your training logic
        result = train_model(parametrization, suffix_name=suffix_name)
        
        # Ax requires returning a dictionary with the metric's name
        return {'mAP_eval': result['mAP_eval']}

Code of Conduct

I agree to follow this Ax's Code of Conduct

Answer 1 · 2024-11-19T21:43:35.000Z

Hello there! Could you also provide your implementation of output_bo_to_completed_multitrials? A minimal repro of the issue would also be very useful if you can get one.

Answer 2 · 2024-11-20T12:22:55.000Z

Of course:

def output_bo_to_completed_multitrials(
        bo_logs_f:str, metric_names:Optional[List[str]]=None) -> List[Dict]:
    """Takes a file with the output logs from a bayesian optimization from the 
    main() of bo_fine_tuning.py and turns it into initial points that can be
    used for another bayesian optimization problem with botorch and ax. This 
    is useful when you think that you can keep searching for better parameters
    and want to reuse the feature points you have already obtained from another
    optimization processes.
    More info: https://github.com/facebook/Ax/issues/768

    Args:
        bo_logs_f (str): file with the outputs from a bayesian optimization 
        generated from bo_fine_tuning.py main()
            
    Returns:
        List[Dict]: List of Dics that represent completed trials. The template
            of a trial is as follows 
                {
                    'input': {'parameter1': val1, ...},
                    'output': {'metric name1': {
                        'mean': mean_val1, 'sem': sem_val1}, ..
                        }
                }
    """
    with open(bo_logs_f, 'r') as f:
        json_points = json.load(f)
    
    trials = []
    for json_point in json_points['outcomes']:
        parameters = list( json_point.values() )[1]
        
        if metric_names:
            keys = list(json_point.keys())[2:]
            if len(metric_names) != len(keys): 
                raise ValueError(
                    'Arg metric_names must have the same number of values as the'
                    ' number of output metrics')
            metrics = {
                new_k: {'mean': json_point[k], 'sem': 0} 
                for new_k, k in zip(metric_names, keys) }
        else:
            metrics = {
                k: {'mean': json_point[k], 'sem': 0} 
                for k in list(json_point.keys())[2:] }
            
        trial = {
            'input': parameters,
            'output': metrics
        }
        trials.append(trial)
    
    return trials

Note that this function preprocess de data points I obtained from other executions.
The processed file is attached to this comment.
experiment_logs.json

Hope this will be enough to understand the script and thank you so much!

Answer 3 · 2024-11-21T09:00:45.000Z

And where should I introduce the result (mAP_eval metric value) that yielded those parameters? My goal is to start the bayesian optimization taking the samples from previous experiments, so I want to incorporate their outcomes and then run the gaussian process.

Answer 4 · 2024-11-22T14:37:04.000Z

Maybe I should simplify my problem so an schematic solution can be provided. Let assume I've got a variable p that stores a list of dictionaries where the keys are the values that my black box function admits and their values are the values that they took during that specific trial, and then I've got too another list with the same length that stores dictionaries with my target metric to optimize in the variable s.
1st Step: incorporate (p, s) as the initial prior knowledge of the surrogate Gaussian process.
2nd Step: perform the iterations to optimize my metric through a Gaussian process.

My problem is that I do not know how I should incorporate the 1st step in a Generation strategy class or giving the points to the "memory" of my experiment, so the botorch model can optimize it sequentially and flawlessly.

Hope this clarification helps, because I'm running out of ideas to incorporate to the script.

Answer 5 · 2024-11-23T01:00:40.000Z

Hi @mariogmgi2g! Sorry for the delay. Based on the stack trace, I suspect the issue is in your implementation of CustomGP. If you can provide that I might be able to help you debug it.

I ran your original solution, excluding this line 'surrogate': Surrogate(CustomGP), and got it to successfully iterate.

Answer 6 · 2024-11-25T14:17:27.000Z

Hello @Cesar-Cardoso, thank you for your answer. Here's the implementation of CustomGP:

from botorch.models.gpytorch import GPyTorchModel
from gpytorch.distributions import MultivariateNormal
from gpytorch.kernels import RBFKernel, ScaleKernel, MaternKernel
from gpytorch.likelihoods import GaussianLikelihood
from gpytorch.means import ConstantMean
from gpytorch.models import ExactGP
from torch import Tensor


class CustomGP(ExactGP, GPyTorchModel):
    
    _num_outputs = 1
    
    def __init__(self, train_X, train_Y, kernel:Literal['rbf', 'matern']='rbf', **kernel_kwargs):
        # Squeeze output fim before passing train_Y to ExactGP
        super().__init__(train_X, train_Y.squeeze(-1), GaussianLikelihood())
        self.mean_modure = ConstantMean()
        if kernel == 'rbf':
            self.covar_module = ScaleKernel(
                base_kernel=RBFKernel(ard_num_dims=train_X.shape[-1], **kernel_kwargs),
            )
        elif kernel == 'matern':
            self.covar_module = ScaleKernel(
                base_kernel=MaternKernel(ard_num_dims=train_X.shape[-1], **kernel_kwargs),
            )
        else:
            raise ValueError(f'{kernel} kernel not implemented')
        
        self.to(train_X)
        
        
    def forward(self, x):
        mean_x = self.mean_modure(x)
        covar_x = self.covar_module(x)
        return MultivariateNormal(mean_x, covar_x)

Is a little modification of the botorch tutorial about custom gaussian process, so if that is the issue, how is possible that using a Sobol strategy step to generate the initial data works fine but with the initial points explicitly passed it fails?

Answer 7 · 2024-11-29T10:37:35.000Z

May I provide more information to facilitate the answer? Thank you very much.

Answer 8 · 2024-12-14T16:36:44.000Z

Here is an example of how I have settled on doing this. Hope it helps. In my case the data is stored in a mongoDB instance and then we can use the attach_trial method. I think this is cleaner than the example given here and may then help disentangle the issues you are seeing.

ax_client = AxClient(generation_strategy=gs)

...

ax_client.create_experiment(
    name=experiment_name,
    parameters=experiment_parameters,
    parameter_constraints=parameter_constraints,
    objectives=objective_metrics,
)

for doc in viscosity_collection.find():
    parameters = doc["parameters"]
    trial_parameters, trial_idx = ax_client.attach_trial(parameters=parameters)
    res = {"viscosity": doc["viscosity"]}
    ax_client.complete_trial(trial_index=trial_idx, raw_data=res)