gradients of noisy loss w.r.t parameter \theta
qingerVT opened this issue · 3 comments
qingerVT commented
The basic procedure sounds like
- (a) set \epsilon = 0, get grads to update \theta
- (b) evaluate new \theta on clean validation set .
- (c) set - grads (validation loss w.r.t \epsilon) as \epsilon
- (d) use new \epsilon to evaluate noisy data and update parameters again
My question is when \epsilon=0, the derivative of loss w.r.t. \theta =0? That means we don't update \theta actually in (a)?
qingerVT commented
Sorry, got it!
haoshuang0223 commented
@qingerVT So what is the answer? I also get confused about it recently.