I am unable to reproduce the results from the paper for llama-7B-32k-longlora ppl.

Question

I am unable to reproduce the results from the paper for llama-7B-32k-longlora ppl.

masteryqq opened this issue 7 months ago · 1 comments

After enabling flash_attn, I am unable to reproduce the results from the paper for llama-7B-32k-longlora. The paper reports a perplexity (ppl) of 7.8 at a sequence length (seq_len) of 4096; however, my result stands at 9.8.(using your eval_distributed.py)

Answer 1 · 2024-05-28T12:31:58.000Z

masteryqq commented 7 months ago