why num_head=1?
jianing-sun opened this issue · 1 comments
jianing-sun commented
For multi-head attention module, why you set num_head=1 according to args
in main.py
? then it is not using multi-head structure of the attention block, is it?
Thanks,
kang205 commented
Please refer to the paper. We found that multi-head doesn't improve the performance in out case.