Why are encoder heads so important when training?
Opened this issue · 0 comments
songhat commented
I found that the presence or absence of an encoder during training has a great impact on the final result. However, the reason is not detailed in the paper. Can you provide some information for me to learn from?