haziqj/iprior

On including the intercept in the EM update

haziqj opened this issue · 1 comments

In previous versions, alpha was included as part of the EM update, with the generalised least squares formula

alpha = (1^T Var.Y.inv 1)^{-1} (1^T Var.Y.inv y)

where 1 is a vector of ones, and just to remind ourselves that Var.Y.inv depends on lambda and psi.

As it turns out, if a non-centred version of the Canonical or FBM kernel is used, then alpha would never reach the MLE of mean(y). The estimation of the lambda parameters are affected, and in turn this affects alpha too. In a sense, the non-centering of the kernel is accounted for in the alpha parameter.

Here is an illustration from the stackloss dataset. We ran three different models.

> summary(mod1) #non-centred, updating alpha

Call:
iprior(formula = stack.loss ~ Air.Flow, data = stackloss)

RKHS used:
Canonical (Air.Flow) 
with a single scale parameter.

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-12.20000  -1.11200  -0.06797   1.02100   8.88800 

            Estimate     S.E.      z P[|Z>z|]    
(Intercept) -43.5768   5.9296 -7.349   <2e-16 ***
lambda        0.0146   0.0106  1.380    0.168    

---
Signif. codes:  0***0.001**0.01*0.05.0.1 ‘ ’ 1

EM converged to within 1e-07 tolerance. No. of iterations: 19813
Standard deviation of errors: 3.997 with S.E.: 0.6321
T2 statistic: 0.9774 on ??? degrees of freedom.
Log-likelihood value: -63.14624 
> summary(mod2) #non-centred, alpha=mean(y)

Call:
iprior(formula = stack.loss ~ Air.Flow, data = stackloss)

RKHS used:
Canonical (Air.Flow) 
with a single scale parameter.

Residuals:
   Min. 1st Qu.  Median 3rd Qu.    Max. 
-10.520  -6.524  -2.524   1.476  24.480 

            Estimate    S.E.     z P[|Z>z|]    
(Intercept)  17.5238  2.1662 8.089   <2e-16 ***
lambda        0.0000 18.0729 0.000        1     #this looks wrong, lambda is 0    
---                                             #regression curve is a flat line
Signif. codes:  0***0.001**0.01*0.05.0.1 ‘ ’ 1

EM converged to within 1e-07 tolerance. No. of iterations: 17
Standard deviation of errors: 9.95 with S.E.: 1.5763
T2 statistic: 6.67e-09 on ??? degrees of freedom.
Log-likelihood value: -77.99705 
> summary(mod3) #centred, alpha=mean(y)

Call:
iprior(formula = stack.loss ~ Air.Flow, data = stackloss)

RKHS used:
Canonical (Air.Flow) 
with a single scale parameter.

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-12.20000  -1.11300  -0.06851   1.02000   8.88700 

            Estimate    S.E.      z P[|Z>z|]    
(Intercept)  17.5238  0.8717 20.102   <2e-16 ***
lambda        0.0991  0.0725  1.368    0.171    

---
Signif. codes:  0***0.001**0.01*0.05.0.1 ‘ ’ 1

EM converged to within 1e-07 tolerance. No. of iterations: 239
Standard deviation of errors: 3.994 with S.E.: 0.6306
T2 statistic: 0.9876 on ??? degrees of freedom.
Log-likelihood value: -61.22966 

In summary, it looks like it is 'wrong' to use a non-centred version of the kernel while using alpha=mean(y). However, it also looks like it is 'OK' to use non-centred version of the kernel while having alpha update in the EM iterations. From this simple illustration, using centred and alpha=mean(y) yields the highest MLE.

It is clear that we need to use alpha = mean(y) as the MLE for the intercept. The contents of this conversation will be turned into a wiki page.