How do you consider about Layer normalization and Batch normalization?
Alobal opened this issue · 0 comments
Alobal commented
Hi,
PCT used the Batch normalization, instead of the Layer normalization, used by original Transformer.
I wonder how do you consider about Layer normalization and Batch normalization in PCT?