Infini-AI-Lab/MagicDec

KV Loading Time

Opened this issue · 2 comments

Thank you very much for your interesting work and contribution to the open source community.

I would like to ask, how is the KV loading time calculated in your paper? How is it strictly distinguished from other parts?

Hi! Thank you for your attention to our work.

We estimated the time cost of each component during LLM inference based on LLM-Viewer, with some modifications to improve the accuracy of our estimations. We apologize for not citing this in the paper and will update the reference accordingly.

Thanks for the reference.

Please forgive my ignorance, but I didn't see the time calculation for different parts in LLM-Viewer.

If it's convenient, can you provide more specific details? For example:

  1. Is the time package used to calculate the time?
  2. Does kv cache loading time refer to the time for cache update in attention, or does it include the process of flash attn?
  3. How are the time for Parameter load, Activation load and store, and Compute calculated?

Thank you again for your open source contribution.