bills867 Dataset Paper- arXiv, code Evaluation metric- Rouge Good overview of intuition for the metric Paper conclusion helpful for more in-depth discussion Why BLEU is non-desirable