[Enhancement] Add sequence-level distillation to NMT training
sxjscience opened this issue · 0 comments
sxjscience commented
Description
Add the sequence-level distillation to NMT training. This means, we draw samples from the teacher model with beam-search and train the student model with the generated samples.