[GENERAL SUPPORT]: Advice on using botorch models with fixed points and outcomes as initial points
Closed this issue · 8 comments
Question
I have implemented a Bayesian Optimization through a botorch model with a sobol initialization amd it runs perfectly. After that, I thought the 40 points calculated by the process can be improve even further, so I took my metric and parameters logs and try to fix a new generation strategy taking those 40 points previously obtained by bayesian a sobol models as my initial sample and then run again the same generation step but only with the botorch step. After some research, I figured out how to give to the process the initial data, but now i have this error:
Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/src/optimization/bo_fine_tuning_given_initial_points.py", line 204, in <module> main() File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/src/optimization/bo_fine_tuning_given_initial_points.py", line 190, in main parameters, trial_index = ax_client.get_next_trial() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/utils/common/executils.py", line 167, in actual_wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/service/ax_client.py", line 543, in get_next_trial generator_run=self._gen_new_generator_run( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/service/ax_client.py", line 1781, in _gen_new_generator_run return not_none(self.generation_strategy).gen( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/generation_strategy.py", line 372, in gen return self._gen_multiple( ^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/generation_strategy.py", line 773, in _gen_multiple self._fit_current_model(data=data) File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/generation_strategy.py", line 852, in _fit_current_model self._curr.fit(experiment=self.experiment, data=data) File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/generation_node.py", line 237, in fit model_spec.fit( # Stores the fitted model as
model_spec._fitted_modelFile "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/model_spec.py", line 147, in fit self._fitted_model = self.model_enum( ^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/registry.py", line 345, in __call__ model_bridge = bridge_class( ^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/torch.py", line 132, in __init__ super().__init__( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/base.py", line 213, in __init__ self._fit_if_implemented( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/base.py", line 236, in _fit_if_implemented self._fit( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/modelbridge/torch.py", line 653, in _fit self.model.fit( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/models/torch/botorch_modular/model.py", line 280, in fit surrogate.fit( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/models/torch/botorch_modular/surrogate.py", line 567, in fit model = self._construct_model( ^^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/models/torch/botorch_modular/surrogate.py", line 471, in _construct_model fit_botorch_model( File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/ax/models/torch/botorch_modular/utils.py", line 301, in fit_botorch_model fit_gpytorch_mll(mll) File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/botorch/fit.py", line 105, in fit_gpytorch_mll return FitGPyTorchMLL( ^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/botorch/utils/dispatcher.py", line 93, in __call__ return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/eos/home-m/mariogon/workspace/blm-detection-detectron2/.py11envA100/lib64/python3.11/site-packages/botorch/fit.py", line 259, in _fit_fallback raise ModelFittingError(msg) botorch.exceptions.errors.ModelFittingError: All attempts to fit the model have failed. For more information, try enabling botorch.settings.debug mode.
As it worked previously with sobol initialization, I suppose I'm supplying the initial data in an unexpected way, but I'm lost and I suppose is an small thing that I'm missing out. I would greatly appreciate some suggestions about how to make it work, and finally my script can be found bellow. Some objects are custom ones, but I'm sure the problem is in the way I manage the initial samples.
n_max_bo_samples = 40
metric_name = 'mAP_eval'
def main():
timestamp = datetime.datetime.now().strftime('%d%m%Y_%H%M%S')
dir_ax_generated_models = f'./data/ax/{timestamp}/'
os.mkdir(dir_ax_generated_models)
# Register coco instances to create datasets to draw images from coco format
coco_dir = '/eos/project/c/cern-robotic-framework/Dataset/Objects/BLM/blm_coco/'
register_coco_instances('blm_train', {},
coco_dir + 'annotations/train_augmented.json',
coco_dir + 'images')
register_coco_instances('blm_val', {},
coco_dir + 'annotations/val.json',
coco_dir + 'images')
register_coco_instances('blm_test', {},
coco_dir + 'annotations/test.json',
coco_dir + 'images')
# Define the search space
search_space = SearchSpace(
parameters=[
ChoiceParameter(
name='roi_batch_size', parameter_type=ParameterType.INT, values=[256, 512, 1024]
),
ChoiceParameter(
name='backbone', parameter_type=ParameterType.STRING, values=['resnet50', 'resnet101']
),
RangeParameter(
name='lr', parameter_type=ParameterType.FLOAT, lower=0.0001, upper=0.01
),
RangeParameter(
name='momentum', parameter_type=ParameterType.FLOAT, lower=0.0, upper=0.9
),
ChoiceParameter(
name='nesterov', parameter_type=ParameterType.BOOL, values=[True, False]
),
ChoiceParameter(
name='use_on_runtime_augmentation', parameter_type=ParameterType.BOOL, values=[True, False]
),
RangeParameter(
name='lr_warmup_factor', parameter_type=ParameterType.FLOAT, lower=0.01, upper=0.1
),
RangeParameter(
name='lr_warmup_iters', parameter_type=ParameterType.INT, lower=0, upper=400
),
RangeParameter(
name='weight_decay', parameter_type=ParameterType.FLOAT, lower=0.0001, upper=0.025
),
RangeParameter(
name='batch_size', parameter_type=ParameterType.INT, lower=1, upper=5
),
RangeParameter(
name='freeze_at', parameter_type=ParameterType.INT, lower=2, upper=4
),
FixedParameter(
name='eval_patience', parameter_type=ParameterType.INT, value=2
),
FixedParameter(
name='epochs', parameter_type=ParameterType.INT, value=6000
),
FixedParameter(
name='eval_min_improvement', parameter_type=ParameterType.INT, value=1
),
FixedParameter(
name='eval_period', parameter_type=ParameterType.INT, value=100
),
FixedParameter(
name='output_dir', parameter_type=ParameterType.STRING, value=dir_ax_generated_models
),
]
)
# Initialize experiment
experiment = Experiment(
name='detectron2_optimization',
search_space=search_space,
optimization_config=OptimizationConfig(
objective=Objective(
metric=MAPMetric(name=metric_name),
minimize=False,
)
),
runner=SyntheticRunner()
)
# Suplying initial fixed samples
f = './data/ax/15112024_185726/experiment_logs.json'
initial_fixed_trials = output_bo_to_completed_multitrials(f, metric_names=[metric_name])
n_initial_samples = len(initial_fixed_trials)
for i, initial_fixed_trial in enumerate(initial_fixed_trials):
arm_name = f'{i}_0'
inputs = initial_fixed_trial['input']
inputs['output_dir'] = dir_ax_generated_models
trial = experiment.new_trial()
trial.add_arm( Arm(parameters=inputs, name=arm_name) )
data = Data(df=pd.DataFrame.from_records([
{
'arm_name': arm_name,
'metric_name': metric,
'mean': output['mean'],
'sem': 0, # output['sem'],
'trial_index': i,
}
for metric, output in initial_fixed_trial['output'].items()])
)
experiment.attach_data(data)
trial.run().complete()
# Gaussian process generation strategy
generation_strategy = GenerationStrategy(
steps=[
GenerationStep( # BayesOpt step
model=Models.BOTORCH_MODULAR,
num_trials=n_max_bo_samples,
model_kwargs={ # Kwargs to pass to BoTorchModel.__init__
'surrogate': Surrogate(CustomGP),
# I use qLogExpectedImprovement due to numerical problems in
# the library implementation of qExpectedImprovement
'botorch_acqf_class': qLogExpectedImprovement,
'fit_out_of_design': True
},
),
]
)
# Create AxClient
ax_client = AxClient(generation_strategy=generation_strategy)
ax_client._experiment = experiment # Attach the predefined experiment
bo_options = {
'n_initial_samples': n_initial_samples,
'initial_samples_strategy': 'fixed',
'n_max_bo_samples': n_max_bo_samples,
'kernel': 'rbf',
'kernel_parametrs': {'lengthscale': None},
# 'best parameters': best_parameters,
# 'best values': best_values
}
outcomes_storager = SaveAxOutcomes(ax_client, bo_options, folder=dir_ax_generated_models)
# Optimization loop
for _ in range(n_max_bo_samples):
parameters, trial_index = ax_client.get_next_trial()
suffix_name = f"BO_trial_{trial_index + 1}"
# Evaluate the model
results = train_model(parameters, suffix_name=suffix_name)
trial_mAP_eval = results[metric_name]
# Complete the trial
ax_client.complete_trial(trial_index=trial_index, raw_data={metric_name: trial_mAP_eval})
outcomes_storager.save_last_ax_values(trial_mAP_eval)
Thank you.
Please provide any relevant code snippet if applicable.
# black box funtion (detectron2 model)
def train_model(parametrization, suffix_name=None):
cfg = get_actual_cfg(**parametrization)
trainer = CustomTrainer(cfg, suffix_name=suffix_name, mapper_on_runtime=custom_mapper)
trainer.resume_or_load(resume=False)
trainer.train()
eval_metric = trainer.get_best_map_val()
return {'mAP_eval': eval_metric}
class MAPMetric(Metric):
def fetch_trial_data(self, trial: BaseTrial):
# Extract parameters and trial number for suffix_name
parametrization = trial.arm.parameters
suffix_name = f'trial_{trial.index}' # Use trial index for suffix name
# Call your training logic
result = train_model(parametrization, suffix_name=suffix_name)
# Ax requires returning a dictionary with the metric's name
return {'mAP_eval': result['mAP_eval']}
Code of Conduct
- I agree to follow this Ax's Code of Conduct
Hello there! Could you also provide your implementation of output_bo_to_completed_multitrials
? A minimal repro of the issue would also be very useful if you can get one.
Of course:
def output_bo_to_completed_multitrials(
bo_logs_f:str, metric_names:Optional[List[str]]=None) -> List[Dict]:
"""Takes a file with the output logs from a bayesian optimization from the
main() of bo_fine_tuning.py and turns it into initial points that can be
used for another bayesian optimization problem with botorch and ax. This
is useful when you think that you can keep searching for better parameters
and want to reuse the feature points you have already obtained from another
optimization processes.
More info: https://github.com/facebook/Ax/issues/768
Args:
bo_logs_f (str): file with the outputs from a bayesian optimization
generated from bo_fine_tuning.py main()
Returns:
List[Dict]: List of Dics that represent completed trials. The template
of a trial is as follows
{
'input': {'parameter1': val1, ...},
'output': {'metric name1': {
'mean': mean_val1, 'sem': sem_val1}, ..
}
}
"""
with open(bo_logs_f, 'r') as f:
json_points = json.load(f)
trials = []
for json_point in json_points['outcomes']:
parameters = list( json_point.values() )[1]
if metric_names:
keys = list(json_point.keys())[2:]
if len(metric_names) != len(keys):
raise ValueError(
'Arg metric_names must have the same number of values as the'
' number of output metrics')
metrics = {
new_k: {'mean': json_point[k], 'sem': 0}
for new_k, k in zip(metric_names, keys) }
else:
metrics = {
k: {'mean': json_point[k], 'sem': 0}
for k in list(json_point.keys())[2:] }
trial = {
'input': parameters,
'output': metrics
}
trials.append(trial)
return trials
Note that this function preprocess de data points I obtained from other executions.
The processed file is attached to this comment.
experiment_logs.json
Hope this will be enough to understand the script and thank you so much!
And where should I introduce the result (mAP_eval metric value) that yielded those parameters? My goal is to start the bayesian optimization taking the samples from previous experiments, so I want to incorporate their outcomes and then run the gaussian process.
Maybe I should simplify my problem so an schematic solution can be provided. Let assume I've got a variable p that stores a list of dictionaries where the keys are the values that my black box function admits and their values are the values that they took during that specific trial, and then I've got too another list with the same length that stores dictionaries with my target metric to optimize in the variable s.
1st Step: incorporate (p, s) as the initial prior knowledge of the surrogate Gaussian process.
2nd Step: perform the iterations to optimize my metric through a Gaussian process.
My problem is that I do not know how I should incorporate the 1st step in a Generation strategy class or giving the points to the "memory" of my experiment, so the botorch model can optimize it sequentially and flawlessly.
Hope this clarification helps, because I'm running out of ideas to incorporate to the script.
Hi @mariogmgi2g! Sorry for the delay. Based on the stack trace, I suspect the issue is in your implementation of CustomGP
. If you can provide that I might be able to help you debug it.
I ran your original solution, excluding this line 'surrogate': Surrogate(CustomGP),
and got it to successfully iterate.
Hello @Cesar-Cardoso, thank you for your answer. Here's the implementation of CustomGP:
from botorch.models.gpytorch import GPyTorchModel
from gpytorch.distributions import MultivariateNormal
from gpytorch.kernels import RBFKernel, ScaleKernel, MaternKernel
from gpytorch.likelihoods import GaussianLikelihood
from gpytorch.means import ConstantMean
from gpytorch.models import ExactGP
from torch import Tensor
class CustomGP(ExactGP, GPyTorchModel):
_num_outputs = 1
def __init__(self, train_X, train_Y, kernel:Literal['rbf', 'matern']='rbf', **kernel_kwargs):
# Squeeze output fim before passing train_Y to ExactGP
super().__init__(train_X, train_Y.squeeze(-1), GaussianLikelihood())
self.mean_modure = ConstantMean()
if kernel == 'rbf':
self.covar_module = ScaleKernel(
base_kernel=RBFKernel(ard_num_dims=train_X.shape[-1], **kernel_kwargs),
)
elif kernel == 'matern':
self.covar_module = ScaleKernel(
base_kernel=MaternKernel(ard_num_dims=train_X.shape[-1], **kernel_kwargs),
)
else:
raise ValueError(f'{kernel} kernel not implemented')
self.to(train_X)
def forward(self, x):
mean_x = self.mean_modure(x)
covar_x = self.covar_module(x)
return MultivariateNormal(mean_x, covar_x)
Is a little modification of the botorch tutorial about custom gaussian process, so if that is the issue, how is possible that using a Sobol strategy step to generate the initial data works fine but with the initial points explicitly passed it fails?
May I provide more information to facilitate the answer? Thank you very much.
Here is an example of how I have settled on doing this. Hope it helps. In my case the data is stored in a mongoDB instance and then we can use the attach_trial method. I think this is cleaner than the example given here and may then help disentangle the issues you are seeing.
ax_client = AxClient(generation_strategy=gs)
...
ax_client.create_experiment(
name=experiment_name,
parameters=experiment_parameters,
parameter_constraints=parameter_constraints,
objectives=objective_metrics,
)
for doc in viscosity_collection.find():
parameters = doc["parameters"]
trial_parameters, trial_idx = ax_client.attach_trial(parameters=parameters)
res = {"viscosity": doc["viscosity"]}
ax_client.complete_trial(trial_index=trial_idx, raw_data=res)