bytedance/lightseq

Question : About construction of total_cache_k, total_cache_v in Transformer

Kangmo opened this issue · 4 comments

In lightseq/csrc/models/transformer.cu,
Should cache_k_out and cache_v_out call set_ancestor? Otherwise why not remove the unused variable cache_k_out and cache_k_out?

Transformer::Transformer {
  ...
  for (auto iter : dec_layer_vec) {
    Variable *cache_k = new Variable("cache_k");
    Variable *cache_v = new Variable("cache_v");
    std::tuple<Variable *, Variable *, Variable *> dec_outs =
        (*iter)(dec_emb, total_enc_kv, pad_mask, cache_k, cache_v);
    dec_emb = std::get<0>(dec_outs);
    Variable *cache_k_out = std::get<1>(dec_outs);
    Variable *cache_v_out = std::get<2>(dec_outs);

    cache_k->set_ancestor(total_cache_k, cache_size * dec_layer_idx);
    cache_v->set_ancestor(total_cache_v, cache_size * dec_layer_idx);
    dec_layer_idx++;
  }

cache_k->set_ancestor(total_cache_k, cache_size * dec_layer_idx);

Sorry, this is the design of the new architecture, which uses some fixed syntax to manage GPU memory sharing.

The set_ancestor function is to assign cache_k to a continuous segment in total_cache_k. Specific to the case you gave here, cache_k_out can be removed.

I'll fix this detail in my next commit

thank you for the confirmation, hexisyztem!