Two questions about the computation of zero score
carabnuu opened this issue · 5 comments
hi , the work is excellent! I'm curious and have two questions about computation of zen score.
- You computed the Frobenius norm in the original paper , while for the code in 'compute_zen_score.py' zen score is computed by 'torch.abs(output - mixup_output)'
- and I don't know why should we get the differential in this step ( refers to step 4 in algorithm 1). I found it's mentioned as 'This step replaces the gradient of x with finite differen�tial ∆ to avoid backward-propagation.' in the original paper , can you give more explanation?
thanks a lot!!
Hi carabnuu,
Thanks for feedback!
- The gradient norm can be approximated by numerical differential. So we can use norm(f(x1)-f(x2)) / norm(x1-x2) to replace gradient norm
- You can use any norm function actually. In our code we use L1-norm (abs) because it is faster to compute. Feel free to use L2-norm as in the paper, or any Lp-norm you would like.
thanks for your answering!That really helps a lot!
but I still have a question about your answer 1 regrading step 4 in algorithm 1.
This step uses numerical differential to replace the gradient of input x. But from the expression in the paper and 'torch.abs(output - mixup_output)' in the code as well,I can only see Lp-norm(f(x1)-f(x2)) , and don't have the denominator part norm(x1-x2). I don't know why. I might be stuck by this simple question.
Thanks again!
norm(x1-x2) is nearly a constant because x2 = x1 + epsilon. In high dimentional space, the norm of a random Gaussian is nearly a constant.
Thank you for your detailed reply !