luofuli/DualRL

Can I use the original sentence to initialize the dual_training?

Karlguo opened this issue · 8 comments

Hi, your work is great and impressed me a lot.
I'm trying to use your work for Chinese style transfer, but the Del_Retr initialization didn't work well. Can I use the sentence itself as the pseudo-parallel data? Thank you.

If you use x -> x as pseudo-parallel data to pre-train the model, the model will learn a copy mode. Thus I recommend that you use the original sentence with some noises as input. For example, delete some words, add some words and permutate some words.

I have tried using x'(noised sentence) -> x(original sentence) as pseudo-parallel data. It works well, especially in content preservation! Good luck to you.

@luofuli Thank you, I got it.

Note: The noised sentence x' (lower quality) should be the input, not the output(ground truth), which is validated to be important by our experiments.
What you need to actually do is to put x'\tx\ninto files of tsf-template dir. That is to say, noised sentence x' should be the first column!

OK, I'll do that, thank you for the answer!

I reopen this issue in case someone with the same problem as you.

Could you do some analysis for this situation ? I don't understand that you designed a bidirectional RL model, why change the order of corpora could have a better result. Thanks a lot~

Do you mean why use x'-> x as pseudo-parallel corpora can achieve better results than x-> x'? @antdlx
The reason is that x' is a style transferred sentence of x via simple methods, e.g, template-based methods or even adding some noise to x. That is to say, x' is of a low quality which may not fluent. Therefore, if you treat x' as the output ground truth of the model, then the decoder will learn to generate sentences of lower quality. And when you input influent sentence x' as input, the encoder will also be influenced. However, the role of the encoder is to extract important information, while the role of the decoder is generating sentences. That is to say, decoder plays a more important role and direct influence on the generated sentences. Therefore, We believe that x-> x' can result in more damage to the decoder, compared to the damage to encoder caused by x'-> x.

You can refer to some papers about unsupervised machine translation. I think the idea of back-translation can help you better understand mine below words.

I get this, thanks a lot! :D