Best model can only be saved based on BLEU score

Question

Best model can only be saved based on BLEU score

abhinavkashyap opened this issue 4 years ago · 1 comments

Hi,

In train.py Line 142. The model will be saved if cur_metric > best_metric. This is useful if BLEU is being used to decide whether to save the best model or no. However, the Delete Retrieve Generate paper, minimizes the reconstruction loss (autoencoder), if I am not wrong. Although dev_loss can be tracked in the code, saving the model based on dev_loss is not possible.

Also, the Readme files shows the graph of BLEU score increasing with the training. Since Delete Retrieve and Generate works on non-parallel data, shouldn't we minimize the loss function? I am also wondering the BLEU score by this repository (It is 7.5 according to Table 4 in the paper).

Answer 1 · 2020-09-24T14:34:37.000Z

Thanks for reaching out!

(1) Happy to accept the change you want to submit a pull request where saving is based on dev loss instead of dev BLEU, but in practice I think this is a detail which doesn't affect things very much (models with low dev loss will also have high BLEU).

(2) Nice observation! This is something other people have brought up, and

My script just runs in one direction (e.g. pos => neg). Maybe running the model in both directions (pos => neg, neg => pos) and then averaging the BLEU would get closer to their results
The implementation of BLEU that the original paper used has bugs in it and does not report correct BLEU scores. For example, it disagrees with multi-bleu.perl which is a canonical implementation of BLEU. If you use their script on our outputs you get something more similar (I think ~7.6 ish) but again their script might not be producing correct BLEU scores.