LIL Implementation
sk-s-hub opened this issue · 1 comments
sk-s-hub commented
Line 119 in ec914ca
The implementation of LIL differs from what is in the paper. I am a bit confused on that aspect as well. If we are going via this implementation then mean that we are taking is not actually division by len(nt) matrix.
dheerajrajagopal commented
Yes, that is correct. We have been trying different variations (sum of hidden vs average of hidden states) such that the changes don't affect performance. This is a new change we tested that did not affect the result and yet made the code cleaner. So, we decided to keep it. If you find that the average vs. sum does make a difference in your experiments, please feel free to reopen the issue and/or send a PR.