On including the intercept in the EM update
haziqj opened this issue · 1 comments
In previous versions, alpha
was included as part of the EM update, with the generalised least squares formula
alpha = (1^T Var.Y.inv 1)^{-1} (1^T Var.Y.inv y)
where 1
is a vector of ones, and just to remind ourselves that Var.Y.inv
depends on lambda
and psi
.
As it turns out, if a non-centred version of the Canonical or FBM kernel is used, then alpha
would never reach the MLE of mean(y)
. The estimation of the lambda
parameters are affected, and in turn this affects alpha
too. In a sense, the non-centering of the kernel is accounted for in the alpha
parameter.
Here is an illustration from the stackloss
dataset. We ran three different models.
> summary(mod1) #non-centred, updating alpha
Call:
iprior(formula = stack.loss ~ Air.Flow, data = stackloss)
RKHS used:
Canonical (Air.Flow)
with a single scale parameter.
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-12.20000 -1.11200 -0.06797 1.02100 8.88800
Estimate S.E. z P[|Z>z|]
(Intercept) -43.5768 5.9296 -7.349 <2e-16 ***
lambda 0.0146 0.0106 1.380 0.168
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
EM converged to within 1e-07 tolerance. No. of iterations: 19813
Standard deviation of errors: 3.997 with S.E.: 0.6321
T2 statistic: 0.9774 on ??? degrees of freedom.
Log-likelihood value: -63.14624
> summary(mod2) #non-centred, alpha=mean(y)
Call:
iprior(formula = stack.loss ~ Air.Flow, data = stackloss)
RKHS used:
Canonical (Air.Flow)
with a single scale parameter.
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-10.520 -6.524 -2.524 1.476 24.480
Estimate S.E. z P[|Z>z|]
(Intercept) 17.5238 2.1662 8.089 <2e-16 ***
lambda 0.0000 18.0729 0.000 1 #this looks wrong, lambda is 0
--- #regression curve is a flat line
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
EM converged to within 1e-07 tolerance. No. of iterations: 17
Standard deviation of errors: 9.95 with S.E.: 1.5763
T2 statistic: 6.67e-09 on ??? degrees of freedom.
Log-likelihood value: -77.99705
> summary(mod3) #centred, alpha=mean(y)
Call:
iprior(formula = stack.loss ~ Air.Flow, data = stackloss)
RKHS used:
Canonical (Air.Flow)
with a single scale parameter.
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-12.20000 -1.11300 -0.06851 1.02000 8.88700
Estimate S.E. z P[|Z>z|]
(Intercept) 17.5238 0.8717 20.102 <2e-16 ***
lambda 0.0991 0.0725 1.368 0.171
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
EM converged to within 1e-07 tolerance. No. of iterations: 239
Standard deviation of errors: 3.994 with S.E.: 0.6306
T2 statistic: 0.9876 on ??? degrees of freedom.
Log-likelihood value: -61.22966
In summary, it looks like it is 'wrong' to use a non-centred version of the kernel while using alpha=mean(y)
. However, it also looks like it is 'OK' to use non-centred version of the kernel while having alpha
update in the EM iterations. From this simple illustration, using centred and alpha=mean(y)
yields the highest MLE.
It is clear that we need to use alpha = mean(y)
as the MLE for the intercept. The contents of this conversation will be turned into a wiki page.