rpryzant/delete_retrieve_generate

Best model can only be saved based on BLEU score

abhinavkashyap opened this issue · 1 comments

Hi,

In train.py Line 142. The model will be saved if cur_metric > best_metric. This is useful if BLEU is being used to decide whether to save the best model or no. However, the Delete Retrieve Generate paper, minimizes the reconstruction loss (autoencoder), if I am not wrong. Although dev_loss can be tracked in the code, saving the model based on dev_loss is not possible.

Also, the Readme files shows the graph of BLEU score increasing with the training. Since Delete Retrieve and Generate works on non-parallel data, shouldn't we minimize the loss function? I am also wondering the BLEU score by this repository (It is 7.5 according to Table 4 in the paper).

Thanks for reaching out!

(1) Happy to accept the change you want to submit a pull request where saving is based on dev loss instead of dev BLEU, but in practice I think this is a detail which doesn't affect things very much (models with low dev loss will also have high BLEU).

(2) Nice observation! This is something other people have brought up, and

  • My script just runs in one direction (e.g. pos => neg). Maybe running the model in both directions (pos => neg, neg => pos) and then averaging the BLEU would get closer to their results
  • The implementation of BLEU that the original paper used has bugs in it and does not report correct BLEU scores. For example, it disagrees with multi-bleu.perl which is a canonical implementation of BLEU. If you use their script on our outputs you get something more similar (I think ~7.6 ish) but again their script might not be producing correct BLEU scores.