Check note on elastic-net
davidrosenberg opened this issue · 2 comments
davidrosenberg commented
I reworked the proof from Appendix A.2 in https://web.stanford.edu/~hastie/Papers/B67.2%20(2005)%20301-320%20Zou%20&%20Hastie.pdf to use our notation. Can somebody check the following:
- Scaling -- I get an annoying sqrt(n) because our objective has an average loss instead of total loss, and I define correlation in the more standard way, as an average rather than a sum. Just would like someone to check that I've got the scaling right.
- The original theorem requires that y be centered, but I don't think that's used in the proof. Am I missing something?
https://github.com/davidrosenberg/mlcourse/blob/gh-pages/in-prep/elastic-net-theorem.pdf
brett1479 commented
- If we multiply J by n we remove the averaging and obtain n*lambda_2 instead of lambda_2. If you think of the data being generated like y = Xw + noise for w fixed then you expect ||y|| to grow like sqrt(n). I think your formula is fine.
- Note that the minimization problem is unchanged if we project y onto the column space of X. Also note that the vector of all ones lies orthogonal to the column space of X since X is centered. Thus centering has no effect on the value of w but does improve the bound. The best improvement (of this sort) occurs by replacing y with its orthogonal projection onto the column space of X.
davidrosenberg commented
Thanks! Great points!