adaptive alpha in UCB

Question

adaptive alpha in UCB

langongjin opened this issue 6 years ago · 17 comments

hi,

I met a issue that the value of objective function is too big (like 200) but the sigma is small (like 0.001 I think because the bounded of x at (0,1)). That is, sigma is nearly not working in \mu(x) + \alpha \sigma(x). How can we do to solve this problem?
In addition, I want to set an adaptive alpha for this problem. Take an simple example, alpha = alpha_initial * (1 - iterations/300)^2, iterations at (1,200). How can implement this on limbo? Maybe we can add this in ucb.hpp. But I do not how to call iterations in ucb.hpp.

Thanks!

Answer 1 · 2018-11-08T22:03:43.000Z

For the alpha, you can have a look at GP_UCB:
https://github.com/resibots/limbo/blob/master/src/limbo/acqui/gp_ucb.hpp

(and implement you own variant).

For the first question, can't you simply scale it in the objective function?

Answer 2 · 2018-11-09T21:35:55.000Z

hi, Thank you for the answer.

For the first question, can't you simply scale it in the objective function?

For a case with benchmarks, we could use scaling. But usually, we do not know the range of objective function. Therefore, We can not.

Answer 3 · 2018-11-12T14:43:22.000Z

hi, all,
How about the above issue?

By the way, How to change the configuration from acqui_ucb to acqui_gpucb? I tried to add

  struct acqui_gpucb : public limbo::defaults::acqui_gpucb {
    //UCB(x) = \mu(x) + \kappa \sigma(x).
    BO_PARAM(double, delta, 0.5); // default delta = 0.1
  };

into struct Params before struct acqui_ucb, but it still works on struct acqui_ucb.
The same for the kernel function, How to set limbo works on kernel_squared_exp_ard not kernel_maternfivehalves?

Thanks!

Answer 4 · 2018-11-12T14:58:07.000Z

hi,

I met a issue that the value of objective function is too big (like 200) but the sigma is small (like 0.001 I think because the bounded of x at (0,1)). That is, sigma is nearly not working in \mu(x) + \alpha \sigma(x). How can we do to solve this problem?

I used the following code

            opt::eval_t operator()(const Eigen::VectorXd& v, const AggregatorFunction& afun, bool gradient) const
            {
                assert(!gradient);
                Eigen::VectorXd mu;
                double sigma;
                std::tie(mu, sigma) = _model.query(v); // ? lan
                std::cout << std::fixed<<"mu = "<< mu<<", sigma = "<< sigma<< std::endl;
                return opt::no_grad(afun(mu) + Params::acqui_ucb::alpha() * sqrt(sigma));
            }

to printed the following partly information from acqui_ucb

9 new_sample: 0.744262 0.677921 0.176501 0.861877 0.993039
mu = 46.538401, sigma = 0.010000
mu = 46.731618, sigma = 0.010000
mu = 46.306099, sigma = 0.010000
mu = 47.801962, sigma = 0.009995
mu = 46.246410, sigma = 0.010000
mu = 45.908218, sigma = 0.010000
mu = 47.936218, sigma = 0.009996
mu = 65.587253, sigma = 0.009090
mu = 46.303632, sigma = 0.010000
mu = 46.829325, sigma = 0.010000
mu = 47.607588, sigma = 0.009993
mu = 58.949446, sigma = 0.009550
.........
mu = 102.165944, sigma = 0.000092
mu = 102.165971, sigma = 0.000092
mu = 102.156710, sigma = 0.000098
mu = 102.151174, sigma = 0.000100
mu = 102.109759, sigma = 0.000137
mu = 102.072491, sigma = 0.000168
mu = 101.587535, sigma = 0.000403
mu = 101.620079, sigma = 0.000297
mu = 46.278747, sigma = 0.010000
0 new point: 0.165487 0.148744 0.870333 0.290767 0.555555 value: 104.763817 best:112.751521

Why the sigma is so small value? And it is not good, right?

Answer 5 · 2018-11-14T10:20:20.000Z

hi, could anybody give me some ideas?

Answer 6 · 2018-11-14T10:30:30.000Z

hi, could anybody give me some ideas?

Hey,

The sigma can be very small for 2 reasons:

Your initial sigma is very small (see here on how to adapt your initial sigma). If you are not optimizing the hyper-parameters of the GP, then you should adjust your initial sigma (and/or UCB alpha) according to the variance of your signal (mu).
You have a lot of points: if you are not optimizing the hyper-parameters of the GP, then the more points you have the smaller sigma will be.

It is neither bad nor good to have such small sigma. It depends on the iteration of the optimization and of a lot of other factors.

So overall, I would say try by optimizing the hyper-parameters of the GP (you need the squared exp kernel for that) and see what you get.

Answer 7 · 2018-11-14T12:30:39.000Z

hi, Thank you for the answer.

The following are the sigma and mu in the beginning. The mu is big but the sigma is so small. That is, we need to set the big alpha in UCB. But the sigma can not be so small in the beginning, right? I am worry something are wrong.

mu = 46.538401, sigma = 0.010000
mu = 46.731618, sigma = 0.010000
mu = 46.306099, sigma = 0.010000
mu = 47.801962, sigma = 0.009995
mu = 46.246410, sigma = 0.010000
mu = 45.908218, sigma = 0.010000
mu = 47.936218, sigma = 0.009996
......

Regarding

So overall, I would say try by optimizing the hyper-parameters of the GP (you need the squared exp kernel for that) and see what you get.

How to configure the squared exp kernel? I tried to set up for squared exp kernel but it always works on kernel_maternfivehalves. The same to acqui_ucb. I tried to add the following code into struct Params before struct acqui_ucb, but it still works on struct acqui_ucb.

  struct acqui_gpucb : public limbo::defaults::acqui_gpucb {
    //UCB(x) = \mu(x) + \kappa \sigma(x).
    BO_PARAM(double, delta, 0.5); // default delta = 0.1
  };

Answer 8 · 2018-11-14T12:40:13.000Z

To tune the GP used by Bayesian optimization, you need to change the type. Please see the documentation here: http://www.resibots.eu/limbo/tutorials/advanced_example.html

It should look like this:

int main()
{
    using kernel_t = kernel::SquaredExpARD<Params>;

    using mean_t = MeanFWModel<Params>;

    using gp_opt_t = model::gp::KernelLFOpt<Params>;

    using gp_t = model::GP<Params, kernel_t, mean_t, gp_opt_t>;

    using acqui_t = acqui::EI<Params, gp_t>;
    using acqui_opt_t = opt::Cmaes<Params>;

    using init_t = init::RandomSampling<Params>;

    using stop_t = boost::fusion::vector<stop::MaxIterations<Params>, MinTolerance<Params>>;

    using stat_t = boost::fusion::vector<stat::ConsoleSummary<Params>, stat::Samples<Params>, stat::Observations<Params>, stat::AggregatedObservations<Params>, stat::GPAcquisitions<Params>, stat::BestAggregatedObservations<Params>, stat::GPKernelHParams<Params>>;

    bayes_opt::BOptimizer<Params, modelfun<gp_t>, acquifun<acqui_t>, acquiopt<acqui_opt_t>, initfun<init_t>, statsfun<stat_t>, stopcrit<stop_t>> boptimizer;
    // Instantiate aggregator
    DistanceToTarget<Params> aggregator({1, 1});
    boptimizer.optimize(eval_func<Params>(), aggregator);
    std::cout << "New target!" << std::endl;
    aggregator = DistanceToTarget<Params>({1.5, 1});
    // Do not forget to pass `false` as the last parameter in `optimize`,
    // so you do not reset the data in boptimizer
    // i.e. keep all the previous data points in the Gaussian Process
    boptimizer.optimize(eval_func<Params>(), aggregator, false);
    return 1;
}

Answer 9 · 2018-11-15T10:57:41.000Z

Thanks, it works!

Answer 10 · 2018-11-15T11:16:35.000Z

In addition, in GP-UCB,
we have:

double nt = std::pow(iteration, dim_in() / 2.0 + 2.0);
        static constexpr double delta3 = Params::acqui_gpucb::delta() * 3;
        static constexpr double pi2 = M_PI * M_PI;
        _beta = std::sqrt(2.0 * std::log(nt * pi2 / delta3));

I know this is the same with [brochu2010tutorial]. But I am not understand why the \kappa is proportional to iteration? In my understanding, I want the kappa is big for searching more space in the beginning. And it is going to low gradually over iterations for more accurate. Therefore, it is not make sense for me. How do you think about this?

by the way, I printed \mu and \sigma.
here is a part of information in iteration 0:

GP_UCB_k: nan t: 0 d: 5 delta: 0.5  mu: 0.53475 sigma: 0.999998
mu: 0.535231 sigma: 0.999955
mu: 0.534181 sigma: 0.999884
mu: 0.535137 sigma: 0.999996
mu: 0.534304 sigma: 1
mu: 0.537689 sigma: 0.999509
mu: 0.534169 sigma: 0.999996
mu: 0.5345 sigma: 1
mu: 0.530256 sigma: 0.999674
mu: 0.535699 sigma: 0.999951
mu: 0.53435 sigma: 1
mu: 0.536222 sigma: 0.999899
mu: 0.534872 sigma: 0.999999

here is a part of information in iteration 45:

GP_UCB_k: 6.16668 t: 45 d: 5 delta: 0.5  mu: 1.15058 sigma: 0.840473
mu: 0.607413 sigma: 0.827006
mu: 0.739215 sigma: 0.882399
mu: 0.793727 sigma: 0.989431
mu: 0.755356 sigma: 0.992473
mu: 0.804057 sigma: 0.999372
mu: 0.7978 sigma: 0.992722
mu: 0.804223 sigma: 0.997695
mu: 0.804928 sigma: 0.999824
mu: 0.804842 sigma: 0.999965
mu: 0.806929 sigma: 0.999965
mu: 0.809337 sigma: 0.999996

I am confused why the sigma are always near 1? Is this right?
I tested the UCB to see the sigma value. They are dynamic in (0, 1).
Could you give me some ideas?
Thanks a lot!

Answer 11 · 2018-11-19T12:17:58.000Z

How do you think about this?

I would recommend reading this paper to understand better GP-UCB: https://arxiv.org/pdf/0912.3995.pdf

I am confused why the sigma are always near 1? Is this right?
I tested the UCB to see the sigma value. They are dynamic in (0, 1).
Could you give me some ideas?

There is no right or wrong. It depends where are you searching. If the printed mu,sigma are for points far away from the ones already in the GP, then this is right. If they are close to the ones already in the GP, you have some bug somewhere.

Answer 12 · 2018-11-20T16:18:12.000Z

Thank you very much!

There is no right or wrong. It depends where are you searching. If the printed mu,sigma are for points far away from the ones already in the GP, then this is right. If they are close to the ones already in the GP, you have some bug somewhere.

It always dynamic near 1. That is, sometimes they are close to the ones already in the GP. Do you think what kind of reasons could be possible for this problem?
Thanks!

Answer 13 · 2018-11-22T10:21:06.000Z

Do you think what kind of reasons could be possible for this problem?

Can you send me your code to replicate the problem?

Answer 14 · 2018-11-22T22:16:30.000Z

hi,
yes, I will attach my code when I polish it. Because it is chaos right now. Thanks a lot!

Answer 15 · 2018-11-26T15:26:59.000Z

hi,
I tested for more functions with the different fitness ranges and domains. I found the value of sigma is strange that all of them always in [0,1], most of them close to 1, no matter how much big the value of fitness. And also, I tested for GP-UCB and UCB. They have the same problem. Can you test the 1# to 7# functions to see the sigma value for GP-UCB and UCB. In particular, for the function SCHWEFEL, Ellipsoid, they have big fitness range and domain, I think the sigma also should be big in the beginning and going low over iteration. But it is not, it also near 1 in the beginning that looks like the sigma is limited in [0,1] that make me confused
here is my code, it is a Cmake project.
BO.zip
Could you have a look at it for me?
Thanks a lot!

Answer 16 · 2018-11-26T15:36:20.000Z

more information,
here is the plot for SCHWEFEL function, note: we take 20 points to take average as a Generations for the comparison with EA. We run 400 iterations and 10 times of Bayesian optimization in the following figure. It not convergenced to the optimal 0.
schwefel_ma_0.2_gp_0.1.pdf
Thanks!

Answer 17 · 2018-11-26T16:46:35.000Z

here are some informations from GP-UCB when tested Schwefel function.
11 new point: GP_UCB_(sqrt(vtau_t)): 2.64327 mu: -2122.2 sigma: 0.99049
21 new point: GP_UCB_(sqrt(vtau_t)): 5.41766 mu: -1981.3 sigma: 0.988434
30 new point: GP_UCB_(sqrt(v*tau_t)): 5.86409 mu: -1978.6 sigma: 0.988214

we can see, the \sigma and sqrt(v*tau_t) are too small compared with the big \mu. This is not we want, right? And why the sigma looks like have no negative value? From my knowledge of BO, sigma can be the positive value and negative value.