princeton-nlp/AutoCompressors

Held-out perplexity question

broalantaps opened this issue · 3 comments

Hey bro, how exciting your work is! Thanks for your contribution. I have some confusion surround me:

The paper mentioned that Perplexity calculated by held-out last 2048 tokens. But it seems that you calculated the entire sequence's nll during the evaluation phase instead of fixing hthe last 2048 tokens. It would be highly appreciated if you could reply me, thanks!

Thanks for your question! During evaluation, all metrics are computed with the compute_loss() function in subset_trainer.py. The loss on each segment is logged individually under the substep_{substep}-seg{i}-nll metric. When evaluating with different segments, you should compare the losses on the last segment for each configuration.

Does this answer your question? Otherwise let me know more details of the issue you're facing.

Thanks for your answer! So you chose the last substep_{substep}-seg{i}-nll and recorded the perplexity, right?

Yes, that's right.