cross-lingual transfer in dialog generation
Chinese dialogs in movie domain: Chinese_corpus/train.txt, Chinese_corpus/dev.txt, Chinese_corpus/test.txt. The sizes are 500/50/500.
English dialogs in movie domain: English_corpus/train.txt, English_corpus/dev.txt. The sizes are 400k/20k.
Chinese dialogs for test in music/book/tech domain: other_domains/music.test.txt, other_domains/book.test.txt, other_domains/tech.test.txt. The sizes are 500/500/500.
data format
each line is a sample, context and response are separated with '<SEP>'