question about shared filters
kashif opened this issue · 2 comments
Shared filters across timescales don’t necessarily sound like an advantage, any intuition on why that is better? do you have an ablation on this? Thanks for any insights!
Hi @kashif , I personally think it is an elegant design since it is directly guided by the wavelet-based multiresolution analysis theory. We get parameter efficiency for free. On the other hand, I don't think it should affect the performance much as I had some preliminary experiments before which relax the filters (I will paste it here later).
@kashif FYI i tested the untied version of the MultiresLayer on the sequential cifar and long ListOps experiment. The results are (every other setting is kept the same as in the paper except using different filters for different timescales)
untied | tied (same as in the paper) | |
---|---|---|
scifar | 92.16% | 93.15% |
long listops | 61.85% | 62.75% |