Best model can only be saved based on BLEU score
abhinavkashyap opened this issue · 1 comments
Hi,
In train.py
Line 142. The model will be saved if cur_metric > best_metric
. This is useful if BLEU
is being used to decide whether to save the best model or no. However, the Delete Retrieve Generate paper, minimizes the reconstruction loss (autoencoder), if I am not wrong. Although dev_loss
can be tracked in the code, saving the model based on dev_loss
is not possible.
Also, the Readme files shows the graph of BLEU score increasing with the training. Since Delete Retrieve and Generate works on non-parallel data, shouldn't we minimize the loss function? I am also wondering the BLEU score by this repository (It is 7.5 according to Table 4 in the paper).
Thanks for reaching out!
(1) Happy to accept the change you want to submit a pull request where saving is based on dev loss instead of dev BLEU, but in practice I think this is a detail which doesn't affect things very much (models with low dev loss will also have high BLEU).
(2) Nice observation! This is something other people have brought up, and
- My script just runs in one direction (e.g. pos => neg). Maybe running the model in both directions (pos => neg, neg => pos) and then averaging the BLEU would get closer to their results
- The implementation of BLEU that the original paper used has bugs in it and does not report correct BLEU scores. For example, it disagrees with multi-bleu.perl which is a canonical implementation of BLEU. If you use their script on our outputs you get something more similar (I think ~7.6 ish) but again their script might not be producing correct BLEU scores.