questions related to the comparison with other common data augmentation in Table 1
Shuwen27 opened this issue · 2 comments
First and foremost, thank you for your outstanding work and the fantastic code repository!
I have a question regarding Table 1. When evaluating other common data augmentation techniques like cropping, word deletion, and replacement, did you also involve the use of distinct dropout masks for pairs? In other words, do the reported results correspond to a model combining different dropout masks for pairs with word deletion, for instance? If this isn't the case, I'm curious about the mechanism you employ to ensure consistent usage of the same dropout masks for pairs when applying these typical data augmentation techniques.
Thank you!
Hi,
In all the other augmentation experiments, unless specified, dropout is also applied as it is part of the standard Transformer training.
Thank you for your quick response! :)