Why center E?

In this line E is centered

iterative_ensemble_smoother/src/iterative_ensemble_smoother/_iterative_ensemble_smoother.py

Line 116 in 24d7711

E -= E.mean(axis=1, keepdims=True)

This decreases the rank by 1.

Under equation 14 it says "E is the centered measurement-perturbation matrix whose columns are sampled from N(0, C_dd)".
Here "centered" could be interpreted as (1) that the mean is 0 in N(0, C_dd) or that (2) the rows in E should be centered.

If we think of equation (15), then the sample covariance does require us to define E as mean centered and divided by (N-1).
But don't we already know the true mean to be zero (ref. our model is eqn (7))?
So the best low-rank approximation to C_dd would be

# approach 1
a ~ N(0, C_dd)
C_dd_bar = a @ a.T / len(a)

and not

# approach 2
a ~ N(0, C_dd)
C_dd_bar = subtract_mean(a) @ subtract_mean(a).T / (len(a) - 1)

Again I appeal to the case when we draw a single sample.
Approach 1 would produce a rank 1 approximation to C_dd, while approach 2 would produce a rank 0 approximation to C_dd (C_dd_bar is identically zero).

If we have a run with a single ensemble member, then E will be identically zero, and there will be no permutation of the observations and no sampling. I would expect that:

1000 runs with 1 ensemble member (different seeds)
1 run with 1000 ensemble member

should both sample the posterior, but in our implementation the first option will always give the same answer is E is identically zero.

Thoughts on this @Blunde1 ? (we can discuss over chat. I just wanted to write it down here)

@Blunde1 wrote:

After talking to @Blunde1 and @dafeda about this, none of us found any good reason to keep the centering. It was removed in #136 .