Check note on elastic-net

Question

davidrosenberg opened this issue 7 years ago · 2 comments

I reworked the proof from Appendix A.2 in https://web.stanford.edu/~hastie/Papers/B67.2%20(2005)%20301-320%20Zou%20&%20Hastie.pdf to use our notation. Can somebody check the following:

Scaling -- I get an annoying sqrt(n) because our objective has an average loss instead of total loss, and I define correlation in the more standard way, as an average rather than a sum. Just would like someone to check that I've got the scaling right.
The original theorem requires that y be centered, but I don't think that's used in the proof. Am I missing something?

Answer 1 · 2018-01-28T22:35:28.000Z

If we multiply J by n we remove the averaging and obtain n*lambda_2 instead of lambda_2. If you think of the data being generated like y = Xw + noise for w fixed then you expect ||y|| to grow like sqrt(n). I think your formula is fine.
Note that the minimization problem is unchanged if we project y onto the column space of X. Also note that the vector of all ones lies orthogonal to the column space of X since X is centered. Thus centering has no effect on the value of w but does improve the bound. The best improvement (of this sort) occurs by replacing y with its orthogonal projection onto the column space of X.

Answer 2 · 2018-01-28T23:18:13.000Z

Thanks! Great points!