About pre-train the DAE

Question

About pre-train the DAE

wangwang110 opened this issue 6 years ago · 1 comments

I find you generated 9 noised data from one-billion data. Do you use all these data to pre-train? And, in the paper "We set Λ = 3 when we train the denoising auto-encoder, and set Λ =[1; 1:8] when we train GEC models", but in the code, Λ = 1.3, which has better performance?

Answer 1 · 2019-07-02T11:20:44.000Z

We generate 9 pieces of noised data, and the noised training data of each epoch is unique.
The Λ value during pre-train is not that important.
When fine-tuning with labeled data, "Λ = 1.3" is a good choice.