probabilistic-numerics/probnum

n-dimensional integration using BQ when a dimension is a-priori fixed

neildhir opened this issue · 2 comments

This question is more about my lack of knowledge/implementation-knowledge rather than a software issue.

The problem is as follows. I would like to estimate the expectation of a variable $Y$ in a Bayesian network when another variable in that network is fixed to a specific value, a constant so that $W=c$.

We can express the expectation like so

$$ \mathbb{E} [Y \mid W = c] = \int\int_R f(B,W)dBdW = \int_B p(Y \mid W=c,B)p(B)dB$$

where $R$ is the region of integration. Now intuitively I would just substitute $W$ for $c$ in my closed-form expression but when using BQ I am not quite sure how to proceed. As ever I would like to use my data-samples to estimate this double-integral and bayesquad_from_data.

Using some previous code to essentially use all the bits that are under the hood of bayesquad_from_data an example would then look like this with data (from my simulator) $\mathcal{D} = (\mathbf{x}_i, y_i)_i$

import numpy as np
y = np.array([[-1.23497876],
       [-1.630657  ],
       [-0.10809242],
       [-1.68074297],
       [-1.32953398],
       [-1.91865033],
       [-0.64207908],
       [-0.98533389],
       [-0.18450834],
       [-0.66933601]])
# Two-dimensional input where the first dimension is fixed
x = np.array([[-3.17500000e+00, -1.17237878e-01],
       [-3.17500000e+00, -1.61918177e-03],
       [-3.17500000e+00, -1.21917698e-01],
       [-3.17500000e+00, -8.29067568e-02],
       [-3.17500000e+00, -1.53236483e-01],
       [-3.17500000e+00, -6.73372652e-02],
       [-3.17500000e+00, -4.00865329e-02],
       [-3.17500000e+00,  6.57646487e-03],
       [-3.17500000e+00, -5.60693973e-02],
       [-3.17500000e+00,  1.83573142e-03]])

where $c=-3.175$.

Setting up a BQ problem with this data.

from probnum.quad.solvers import BayesianQuadrature
from probnum.quad.solvers.belief_updates import BQStandardBeliefUpdate
from probnum.quad.solvers.stopping_criteria import ImmediateStop
from probnum.quad.integration_measures import LebesgueMeasure
from probnum.randprocs.kernels import ExpQuad

# Define the BQ method
kernel = ExpQuad(input_shape=(2,), lengthscales=[0.9273, 0.6727]) #RBF
domains = ([-3.175,-5],[0,4]) # [lower_W, lower_B], [upper_W, upper_B]
measure = LebesgueMeasure(domain=domains)  # this defines p(x) and D
noise_std = 0.05
belief_update =  BQStandardBeliefUpdate(jitter=noise_std**2, scale_estimation="mle")

bq = BayesianQuadrature(kernel=kernel, 
                        measure=measure,
                        policy=None,
                        belief_update=belief_update,
                        stopping_criterion=ImmediateStop(),
                        initial_design=None)

# integrate
F, _, info = bq.integrate(fun=None, nodes=x, fun_evals=y.ravel(), rng=None)

print(F)
print(f"Mean: {F.mean} and std: {F.std}")
print(info)

Now I am pretty sure this is wrong for two reasons:

  1. The integration nominally takes place over a 2D region which becomes a line when we fix a dimension so perhaps it should really just a be a 1D problem and kernel = ExpQuad(input_shape=(1,)) but
  2. if we do that then how to make the problem (i.e. BQ) take into account that $W=c$ and is still a 2D problem but a constrained one.

I don't know if there is a way to 'turn-off' dimensions in the `LebesgueMeasure' so that integration only takes place over the relevant dimension $B$ but whilst also maintaining the correct dimensionality of the problem (i.e. the kernel maintains an input shape of 2)? Or if the domains could be written in a clever way so that integration properly takes place w.r.t. the constant dimension?

I don't know if there is a way to 'turn-off' dimensions in the `LebesgueMeasure' so that integration only takes place over the relevant dimension but whilst also maintaining the correct dimensionality of the problem (i.e. the kernel maintains an input shape of 2)?

Currently, there is no way to marginalize a GP with the Probnum BQ code directly (i.e., building a multidimensional GP and computing the integral only over a subset of dimensions). If your $c$ value is indeed fixed, you may want to explore analytic ways to rephrase your problem as a 1D problem and see where it leads you. But I am afraid I cannot help you all too much with this bit.

Understood, thank you!