yxuansu/TaCL

Is there a bug with the teacher model?

Nipi64310 opened this issue · 1 comments

Hi, @yxuansu.
Thank you for sharing code and data.

https://github.com/yxuansu/TaCL/blob/d92e47cfa3c24d9b674423a01b3e4216a6b62891/pretraining/bert_contrastive.py#L66
https://github.com/yxuansu/TaCL/blob/d92e47cfa3c24d9b674423a01b3e4216a6b62891/pretraining/train.py#L76

In the above code, the teacher model does not perform the eval() operation, so whether the same input will get different results due to dropout. Is there something wrong with my understanding, and if not, have you compared the teacher model using eval()?

Hi, @yxuansu. Thank you for sharing code and data.

https://github.com/yxuansu/TaCL/blob/d92e47cfa3c24d9b674423a01b3e4216a6b62891/pretraining/bert_contrastive.py#L66
https://github.com/yxuansu/TaCL/blob/d92e47cfa3c24d9b674423a01b3e4216a6b62891/pretraining/train.py#L76

In the above code, the teacher model does not perform the eval() operation, so whether the same input will get different results due to dropout. Is there something wrong with my understanding, and if not, have you compared the teacher model using eval()?

Hi,

Thank you for your interest in our work. I think you are right, we might need to add eval() in the teacher model to disable the dropout. But I do not think this is a severe issue as the pre-training of large language models always involves a high level of stochasticity. Once I have time, I will re-run the experiment with eval() mode and see what results (hopefully better?) we get.