the reason why use sigmoid in ASHP
Opened this issue · 1 comments
happyamyhope commented
Hi, thanks for your share. i would like to know the reason that use sigmoid in ASHP module, why not tanh or other activation functions, looking forward to your reply sincerely!
yun-liu commented
The sigmoid function is a common choice in the attention mechanisims to squeeze the attention values into the range of (0, 1).