Questions about the intermediate tensor buffers design
Dazz993 opened this issue · 0 comments
Dazz993 commented
Hi Team! Really nice work!
I am a little bit confused about the design choices related to the intermediate tensor buffers when reading the codes.
- Could you explain the purpose of
cache_home
,cache_read_buf
andcache_write_buf
? I am wondering why we need multiple buffers (instead of a single one) - I noticed that for the kv cache, there are
cache_home
,cache_read_buf
, andcache_write_buf
, but for the hidden states, there is onlyself.hidden
. Could you explain the reason for this difference? - Additionally, I am curious why there is no need to have a cudastream for hidden states' loading and storing.
My basic understanding:
When loading the cache, tensor will be copied from cache_home
to cache_read_buf
and then, when storing the buffer tensor will be copied from write_buf
to cache_home
. But I don't really understand why we cannot modify them in a single buffer.
These confusions may be due to some special design or necessity in the implementation, or they may be the result of not understanding the code particularly well. I'm very much looking forward to your answers, thanks in advance!