EmuKit/emukit

Bayesian optimization gives flat latent function?

iRove108 opened this issue · 10 comments

I'm learning a noisy function. I've been getting a flat latent function, and I'm not sure why.

For a self-contained working example, I'm abstracting away a run from my actual application. That is, I'm providing precomputed values for my initial design and my objective function.

Define my function and my initial design:

import numpy as np
import matplotlib.pyplot as plt

from emukit.core import ParameterSpace, ContinuousParameter
from emukit.core.loop import FixedIterationsStoppingCondition, UserFunctionWrapper
from emukit.bayesian_optimization.acquisitions import ExpectedImprovement
from emukit.bayesian_optimization.loops import BayesianOptimizationLoop
from emukit.model_wrappers import GPyModelWrapper

import GPy
from GPy.models import GPRegression

# Define my function, which loops through 30 values in order
# (values precomputed from actually running BayesOpt on my real function)
class SimFunction:
    def __init__(self):
        self.times_called = 0
        self.arr = np.array([151.41014033,  62.09023478  ,62.52426835,  61.8066145,  153.27467322, 61.47440947,  61.66229474 , 62.23729366 , 62.66330217 , 61.37228135, 63.71660287 , 60.80217923,  61.55372866 , 61.85417587,  61.29182696 ,62.01017402  ,63.35483382  ,60.78094205,  62.70740196  ,63.44405674,62.34055014,  61.26645109 , 61.18795096,  62.27117287,  62.60566014,60.67162767,  61.84573099 , 62.86046971,  61.38240327 , 62.15477646])
    def __call__(self, x):
        value = self.arr[self.times_called]
        self.times_called += 1

        if self.times_called >= len(self.arr):
            self.times_called = 0
        return value

sim_function = SimFunction()
f = np.frompyfunc(sim_function, 1, 1)
parameter_space = ParameterSpace([ContinuousParameter('x1', 5, 100)])

# Define my initial design
# (values precomputed from actually running BayesOpt on my real function)
X = np.array([59.33766262,  66.87705975,   9.99274413,  91.77151485,
         15.98334223,  75.47932193,  59.4018363 ,  28.62138164,
         73.61994809,  18.89241105])[:,None]
Y = np.array([41.462845288004075, 45.1335386921359, 40.595514445980804,
        55.882947142107895, 19.57217258906048, 49.775813658233105,
        41.639729759946434, 25.650601902591085, 48.58427959331652,
        20.838451230479237])[:,None]

# Train GP model on initial evaluations
model_gpy = GPRegression(X, Y)
model_gpy.optimize()
model = GPyModelWrapper(model_gpy)

model_gpy.plot()
plt.show()

GP regression on initial points:
plot1

Run bayesian optimization loop and display final latent function:

ITER = 30

# initialize bayesian optimization loop
expected_improvement = ExpectedImprovement(model=model)
loop = BayesianOptimizationLoop(model=model,
                                space=parameter_space,
                                acquisition=expected_improvement,
                                batch_size=1)

# run bayesian optimization
loop.run_loop(UserFunctionWrapper(f), FixedIterationsStoppingCondition(ITER))

model_gpy.plot()
plt.show()

Screen Shot 2022-02-24 at 12 00 50 AM

The latent function is completely flat! Its parameters also seem out of wack:

Name : GP regression
Objective : 187.15753229070074
Number of Parameters : 3
Number of Optimization Parameters : 3
Updates : True
Parameters:
  GP_regression.           |               value  |  constraints  |  priors
  rbf.variance             |   3769.474111833743  |      +ve      |        
  rbf.lengthscale          |  2762.2467570215567  |      +ve      |        
  Gaussian_noise.variance  |    590.833747684825  |      +ve      |        

However, if I fit another GP regression model on the same data points as in the above graph, I get a different function that also seems more reasonable:

# GP regression on experimental points, run separately
model_gpy2 = GPRegression(loop.loop_state.X, loop.loop_state.Y)
model_gpy2.optimize()
model_gpy2.plot()
plt.show()

Screen Shot 2022-02-24 at 12 09 15 AM

Name : GP regression
Objective : 95.81686749520499
Number of Parameters : 3
Number of Optimization Parameters : 3
Updates : True
Parameters:
  GP_regression.           |               value  |  constraints  |  priors
  rbf.variance             |   4170.330107829418  |      +ve      |        
  rbf.lengthscale          |   5.611544854612646  |      +ve      |        
  Gaussian_noise.variance  |  0.6604288598827055  |      +ve      |        

I believe the two above graphs should be identical. I think the model is not fitting in the optimization loop correctly. I wonder if this has to do with the RuntimeWarning: overflow encountered in expm1 from paramz/transformations.py:111.

I'd greatly appreciate any ideas on this. Thanks!

Thanks for the report! No idea what's going on, but I've similar behavior recently, and the same warning too. Some googling shows that this might be related to Emukit generating some very big numbers somewhere: https://stackoverflow.com/questions/40726490/overflow-error-in-pythons-numpy-exp-function

Would you be able to get a full stack trace of this warning?

For sure! I was able to get a trace of it below in colab. Seems to be an issue with GPy and paramz.optimize_restarts. The final warning is thrown here.

This thread looks related. Despite the claim that it is "just a warning," I suspect it still might be causing problems here (particularly given the high parameters reported), but I'm not positive.

  File "<ipython-input-4-46631344dbd1>", line 11, in <module>
    loop.run_loop(UserFunctionWrapper(f), FixedIterationsStoppingCondition(ITER))
  File "/usr/local/lib/python3.7/dist-packages/emukit/core/loop/outer_loop.py", line 91, in run_loop
    self._update_models()
  File "/usr/local/lib/python3.7/dist-packages/emukit/core/loop/outer_loop.py", line 104, in _update_models
    model_updater.update(self.loop_state)
  File "/usr/local/lib/python3.7/dist-packages/emukit/core/loop/model_updaters.py", line 62, in update
    self.model.optimize()
  File "/usr/local/lib/python3.7/dist-packages/emukit/model_wrappers/gpy_model_wrappers.py", line 86, in optimize
    self.model.optimize_restarts(self.n_restarts, verbose=verbose, robust=True)
  File "/usr/local/lib/python3.7/dist-packages/paramz/model.py", line 152, in optimize_restarts
    initial_parameters = self.optimizer_array.copy()
  File "/usr/local/lib/python3.7/dist-packages/paramz/core/parameter_core.py", line 87, in optimizer_array
    [np.put(self._optimizer_copy_, ind, c.finv(self.param_array[ind])) for c, ind in self.constraints.items() if c != __fixed__]
  File "/usr/local/lib/python3.7/dist-packages/paramz/core/parameter_core.py", line 87, in <listcomp>
    [np.put(self._optimizer_copy_, ind, c.finv(self.param_array[ind])) for c, ind in self.constraints.items() if c != __fixed__]
  File "/usr/local/lib/python3.7/dist-packages/paramz/transformations.py", line 111, in finv
    return np.where(f>_lim_val, f, np.log(np.expm1(f)))
 /usr/local/lib/python3.7/dist-packages/paramz/transformations.py:111: RuntimeWarning:overflow encountered in expm1

Thanks! What happens if in your second snippet the GPy model is wrapped in the Emukit wrapper? I am suspecting the issue might be because of the fact we don't jsut call optimize, and instead optimize_restarts(self.n_restarts, verbose=verbose, robust=True)

At first I was thinking that too. I tried replacing the second snippet with:

# GP regression on experimental points, run separately
model_gpy2 = GPRegression(loop.loop_state.X, loop.loop_state.Y)
model_gpy2_emukit = GPyModelWrapper(model_gpy2)
model_gpy2_emukit.optimize()
model_gpy2.plot()
plt.show()

And it worked the same (fit the function properly).

I also tried creating a new GPyModelWrapper class that calls optimize instead, but the warning was still thrown with the following traceback:

  File "/var/folders/xl/nxhg5qwd3f7b8yjnc80d334h0000gn/T/ipykernel_50332/4090956357.py", line 15, in <module>
    loop.run_loop(UserFunctionWrapper(f), FixedIterationsStoppingCondition(MAX_ITER))
  File "python3.9/site-packages/emukit/core/loop/outer_loop.py", line 91, in run_loop
    self._update_models()
  File "python3.9/site-packages/emukit/core/loop/outer_loop.py", line 104, in _update_models
    model_updater.update(self.loop_state)
  File "python3.9/site-packages/emukit/core/loop/model_updaters.py", line 62, in update
    self.model.optimize()
  File "gpy_model_wrapper_norestart.py", line 93, in optimize
    self.model.optimize()
  File "python3.9/site-packages/GPy/core/gp.py", line 675, in optimize
    ret = super(GP, self).optimize(optimizer, start, messages, max_iters, ipython_notebook, clear_after_finish, **kwargs)
  File "python3.9/site-packages/paramz/model.py", line 98, in optimize
    start = self.optimizer_array
  File "python3.9/site-packages/paramz/core/parameter_core.py", line 87, in optimizer_array
    [np.put(self._optimizer_copy_, ind, c.finv(self.param_array[ind])) for c, ind in self.constraints.items() if c != __fixed__]
  File "python3.9/site-packages/paramz/core/parameter_core.py", line 87, in <listcomp>
    [np.put(self._optimizer_copy_, ind, c.finv(self.param_array[ind])) for c, ind in self.constraints.items() if c != __fixed__]
  File "python3.9/site-packages/paramz/transformations.py", line 111, in finv
    return np.where(f>_lim_val, f, np.log(np.expm1(f)))
python3.9/site-packages/paramz/transformations.py:111: RuntimeWarning: overflow encountered in expm1

My guess is that it isn't the model wrapper that's malfunctioning but rather BayesianOptimizationLoop.

THanks for all this effort. I'll try to carve out time to debug this properly and update here

Ok, I think this is just a modelling issue, and the fact that the error comes from paramz is a hint at that. When you are creating the model, it's just plain GPRegression from GPy, with all default parameters. This includes default prior values for lengthscale, variance and noise, all set to 1 in GPy by default I believe. Usually this isn't a problem, but in your case this is completely out of scale of actual inputs and outputs. When I am playing with different combinations of prior values for these, the result is very different fits.

The simplest thing to try would be to normalize your data to [0,1]. Setting priors might also be worth it, but it's much harder to pick good ones, and probably won't be necessary.

Hi, thank you so much for that—I really appreciate the help! I tried passing normalizer=Standardize() to GPRegression and it worked much better. It's no longer giving any warnings as well.

I was still wondering, though: should I expect the graph of the latent function after BayesianOptimizationLoop.run_loop to match the one after doing GP separately on (loop.loop_state.X, loop.loop_state.Y)? (as in my original question)

For example, after re-running the process with some more data, the graph displayed following BayesianOptimizationLoop.run_loop looks like:
Screen Shot 2022-03-01 at 12 55 57 AM

While the graph displayed after optimizing a new GPRegression model (with normalizer=Standardize() still) looks like:

Screen Shot 2022-03-01 at 12 56 33 AM

No, generally they don't have to be the same. Re-training from scratch often yields different parameter values compared to iteratively adding points one by one

Ah ok, thanks for the clarification! I assume differences could come from randomization while optimizing parameters.

Indeed