Reproduction Bias
SpectrezZ opened this issue · 4 comments
First of all, thx to you guys all for the stimulating insights! I have a question: When I try to follow the coding logic and reproduce the results with Megatron-LM, I encounter some biases in the final results. Specifically, the speedup of Eagle1 generally aligns with what is mentioned in the paper, but the performance of Eagle2 is unsatisfactory, as it can’t reach a 5.x speedup(only about 0.5 higher than Eagle1). Are there any tricks or details in the implementation that you guys could share(like some other inference optimizations)?
We haven't tried Megatron-LM before. But according to our experience, the speedup ratio depends on several factors, such as target model types, sizes, tasks (coding problem is faster than others), datasets, etc.
Sry, btw, when are the evaluation results for Qwen2 and LLaMA3 expected to be released?
LLaMA3's evaluation results have been released in the EAGLE2 paper. Qwen2's speedup ratio is similar to that of LLaMA3 (we have no plan to show it in the paper as Qwen2 is trained after our paper is written). You are welcome to test them by yourself as all checkpoints have been released.
Thank you, Prof. Zhang, for patiently answering my questions. 🙏