Feature Normalization in the Signal Propagation Experiment
Opened this issue · 0 comments
Hi, can I check what the logic is behind this line in the signal propagation experiment file? I imagine it's to do with the normalization,
as in the last equation on Page 20 of the paper. Firstly, the denominator of that term doesn't make sense to me since
Not sure if this is the intended formulation, but I guess the aforementioned line can be updated to normalize by out.abs().sum(dim=1, keepdims=True)
, which would correspond to the summands
If so, I can verify the implementation and create a PR.