Seeding seems insufficient for reproducible results
bouthilx opened this issue · 3 comments
Hi there!
I've been trying to pass a RNG to get reproducible results with default BO with GP and MCMC but there seems to be an issue. The suggestions vary even though the RNG is exactly in the same state. Here's a miminal example based on the tutorials
import numpy as np
from robo.fmin import bayesian_optimization
def objective_function(x):
y = np.sin(3 * x[0]) * 4 * (x[0] - 1) * (x[0] + 2)
return y
lower = np.array([0])
upper = np.array([6])
for i in range(10):
results = bayesian_optimization(
objective_function, lower, upper, n_init=3, num_iterations=4,
rng=np.random.RandomState(1))
print(results['X'][-1])
print(results['y'][-1])
If I set n_init=num_iterations=3
, results are all the same, so the initial design function seems to be fine.
[3.440648986884316]
-41.51266042206307
[3.440648986884316]
-41.51266042206307
[3.440648986884316]
-41.51266042206307
[3.440648986884316]
-41.51266042206307
[3.440648986884316]
-41.51266042206307
[3.440648986884316]
-41.51266042206307
[3.440648986884316]
-41.51266042206307
[3.440648986884316]
-41.51266042206307
[3.440648986884316]
-41.51266042206307
[3.440648986884316]
-41.51266042206307
But with n_init=3
and num_iterations=4
I get widely different first predictions even though the RNGs are in same states.
[3.6566122755593655]
-60.089794695135154
[5.993304371981847]
-121.98986837036851
[3.1407303782647427]
0.11387110071274331
[5.987477413556695]
-123.53696127453433
[5.98027745465736]
-125.38748777146728
[5.998170174144782]
-120.66432227076356
[5.992980141131193]
-122.0771099697352
[5.988609060862039]
-123.23995204518239
[5.974711533449306]
-126.7714049599996
[5.613264680901897]
-127.16860268902231
The issue is present with model_type='gp' as well.
[2.8674112486174606]
26.645899577939858
[2.9169600819030936]
23.527709494007958
[2.8561200176477133]
27.23894479441541
[2.874465623953853]
26.253124818087755
[2.8798995680253796]
25.93892550980833
[2.8641460456466534]
26.821907867358163
[2.8648879100301494]
26.782240283130392
[2.8621407737008595]
26.928181931974308
[2.8585025153648242]
27.11746469145211
[2.8559497082443217]
27.247553710589933
After digging in the library I realized george
does not take any seed is argument and rely on global RNG state. See for example here.
Adding np.random.seed(1)
in the loop does solve the issue for both GP and GP_MCMC but I would rather avoid global seeding.
So I have 2 related questions:
- Why do you use a forked version of
george
? Would it be easy to port any modifications to main repo? - Do you consider you model implementation tightly integrated with
george
or using supporting backend would be fairly simple?
To answer your questions:
- The reason we used a forked version of george is because we developed our own kernels for Fabolas and MTBO. The main repo has changed quite a bit and I do not know how much overhead it would to adjust to it.
- While I don't think the implementation is tightly integrated with george, RoBO is actually not maintained anymore. I suggest to use emukit which has a more modular structure and contains more or less the same functionality as RoBO.
Hi Aaron!
I was planning to use emukit for PROFET (nice work by the way!), so I'd be happy to use it as well for BO.
Thanks!