`suggest` does not return the maximum of acquisition function

Question

`suggest` does not return the maximum of acquisition function

Closed this issue 6 months ago · 7 comments

I'm new to this package, and I'm confused on how suggest method generates the next best guess. It seems like suggest should find the argmax of an acquisition function and return it as the next guess. However, I found that sometimes, even when a clear maximum exist, suggest would return some values deviate from the argmax.

One example could be found in the tutorial. In input [7], the bottom figure shows next best guess is around 3. However, in input [8], it is clear that the third evaluated point is 9.455.

Another example is the following:

import numpy as np
import matplotlib.pyplot as plt
from bayes_opt import BayesianOptimization
from bayes_opt import UtilityFunction

def black_box_function(x):
    return np.exp(-(x - 2) ** 2) + np.exp(-(x - 6) ** 2 / 10) + 1/ (x ** 2 + 1)

optimizer = BayesianOptimization(black_box_function, {'x': (-2, 10)}, random_state=2, verbose=0)
acq_function = UtilityFunction(kind="ucb", kappa=1, xi=1e-1)
optimizer.maximize(init_points=2, n_iter=2, acquisition_function = acq_function)

x = np.linspace(-2,10,200).reshape(-1,1)
utility = acq_function.utility(x, optimizer._gp, 0)
plt.plot(x, utility, label='Acquisition Function', color='purple')
print(optimizer.suggest(acq_function))
print(x[np.argmax(utility)])

Here, suggest method gives around 1.75, but argmax is 2.28.

Does this only happen when we have too few observation? Is there any ducument discussing how such random value is chosen?

Thanks a lot for the help!

Answer 1 · 2024-03-25T08:42:50.000Z

Hi @yichenfu,

the GP is fitted during the suggest step, which is why you see differing results -- I adapted your example to showcase this

import numpy as np
import matplotlib.pyplot as plt
from bayes_opt import BayesianOptimization
from bayes_opt import UtilityFunction

def black_box_function(x):
    return np.exp(-(x - 2) ** 2) + np.exp(-(x - 6) ** 2 / 10) + 1/ (x ** 2 + 1)

optimizer = BayesianOptimization(black_box_function, {'x': (-2, 10)}, random_state=2, verbose=0)
acq_function = UtilityFunction(kind="ucb", kappa=1, xi=1e-1)
optimizer.maximize(init_points=2, n_iter=2, acquisition_function = acq_function)

x = np.linspace(-2,10,200).reshape(-1,1)
utility = acq_function.utility(x, optimizer._gp, 0)
print(x[np.argmax(utility)]) # 2.28140704
plt.plot(x, utility, label='Acquisition Function', color='purple')
print(optimizer.suggest(acq_function)) # 1.7505961918573731
utility = acq_function.utility(x, optimizer._gp, 0)
print(x[np.argmax(utility)]) # 1.73869347

Hope this helps!

Regarding the example you also linked -- good point. I will have to look into that, but I currently don't have the bandwith.

Answer 2 · 2024-03-25T08:55:44.000Z

Re the documentation problem: After thinking about it for a bit, the GP fitting that sklearn does is not deterministic, meaning if you call .fit twice on the same data you don't necessarily obtain the same GP. This can happen especially if you're not fitting on a lot of data, in my experience -- presumably generating a log-marginal likelihood with many similar local maxima. I would guess that this is the most likely difference between the plot's argmax and the eventually evaluated point.

Answer 3 · 2024-03-25T14:59:29.000Z

Hi @till-m,

Thank you so much! That explains a lot to me. Is it possible to fix the GP, so that suggest will show me the same point as the plotted argmax?

Answer 4 · 2024-03-25T15:04:52.000Z

Hi @yichenfu,

I guess you could technically re-set the random_state of the GP before every fit, though I am not sure if this will work (if you do try it, could you report back and let me know?)

But maybe it would be better to ask why you want to do this -- what are you trying to achieve? Is there any reason not to plot the argmax after .suggest has handled the fitting?

Answer 5 · 2024-03-25T18:00:14.000Z

Hi @till-m,

Thanks for your quick response! The reason I want to do this is just trying to better understand how this function works, and out of curiousity lol.

I tried to fix the random state using optimizer._gp.random_state = ensure_rng(), but it doesn't seem to work. But anyway, I found a better way (at least for me) to understand the difference.

import numpy as np
from bayes_opt import BayesianOptimization
from bayes_opt import UtilityFunction

def black_box_function(x):
    return np.exp(-(x - 2) ** 2) + np.exp(-(x - 6) ** 2 / 10) + 1/ (x ** 2 + 1)

optimizer = BayesianOptimization(black_box_function, {'x': (-2, 10)}, random_state=2, verbose=0)
acq_function = UtilityFunction(kind="ucb", kappa=1, xi=1e-1)

optimizer.maximize(init_points=2, n_iter=2, acquisition_function = acq_function)
print(len(optimizer.res), len(optimizer._gp.X_train_)) # It returns "4 3"

next_point = optimizer.suggest(acq_function)
print(len(optimizer.res), len(optimizer._gp.X_train_)) # It returns "4 4"

It shows that after calling maximize, the GP only fits all but the last data points. However, the suggest method uses all data to fit a GP and gives its maximum. The number of data used are different, so the GP gives different maximum.

However, I still have some trouble understanding the case in documentation. It is hard for me to believe the difference between the plot's argmax and the eventually evaluated point is due to the randomness in GP.fit. For example, in input [7] and [9], the plotted GP has zero variance at observed points. However, in input [8], GP has uniform variance regardless of the observation points. Do you know why GP in input [8] is so special?

Thanks again for your help and explaination!

Answer 6 · 2024-03-26T08:30:09.000Z

Hi @yichenfu,

The reason I want to do this is just trying to better understand how this function works, and out of curiousity lol.

Perfectly valid, thanks!

The number of data used are different, so the GP gives different maximum.

Just to be explicit, this is because the GP is fit during the .suggest step, meaning the GP is not fitted to the last point -- the optimization loop simply terminates. This is done intentionally, since fitting the GP is computationally expensive and if the point is not used, we'd rather save on the resources.

But to be clear, there's two things going on here: In your case -- as you have correctly observed and demonstrated -- the fitting happens after the plot due to what I just mentioned. In the documentation however, the difference does indeed come from a failure in fitting, this can happen especially if the number of points (compared to the dimensionality of the problem) is rather low.

I've written some code to demonstrate this -- feel free to run it a few times after cell 7 or 8.

fix_fit_state = False
if fix_fit_state:
    optimizer.set_gp_params(random_state=42)
else:
    optimizer.set_gp_params(random_state=np.random.randint(0, 1000))

optimizer._gp.fit(optimizer._space.params, optimizer._space.target)

plot_gp(optimizer, x, y)
print(f"Kernel length scale: {optimizer._gp.kernel_.length_scale}")

You should see a) varying length scales and b) varying acq and posteriors.

Do you know why GP in input [8] is so special?

The GP there is not necessarily special, it's just that during the fitting process the estimation of the length scale parameter failed and produced a very small length scale. If you set the GP's kernel to enforce a higher lower-bound on the length scale parameter, you should not see this uniform variance that you have obeserved.

Answer 7 · 2024-03-26T13:43:57.000Z

Thanks again for explaining to me with so many details! I believe it solves all my questions and concerns. I'm new to both the math and the package, and I learned a lot from this question. I'll close this issue.