About pre-train the DAE
wangwang110 opened this issue · 1 comments
wangwang110 commented
I find you generated 9 noised data from one-billion data. Do you use all these data to pre-train? And, in the paper "We set Λ = 3 when we train the denoising auto-encoder, and set Λ =[1; 1:8] when we train GEC models", but in the code, Λ = 1.3, which has better performance?
zhawe01 commented
- We generate 9 pieces of noised data, and the noised training data of each epoch is unique.
- The Λ value during pre-train is not that important.
- When fine-tuning with labeled data, "Λ = 1.3" is a good choice.