arc-research-lab/CHARM

About MM size in paper and demo

sato-imo opened this issue · 1 comments

Could you please describe what is your input size of the original model, especially for ViT and BERT.
And how do you organize the matrices to form the MxK and KxN matrices as shown in Table 5.
image

  • For BERT, we set batch_size = 6, Seqence_len = 512, Embed_dim = 1024, Heads=16, MLP_ratio=4

  • For ViT, we set batch_size = 48, Seqence_len = 64, Embed_dim = 1024, Heads=16, MLP_ratio=4

  1. For QKV and Projection layer M = (batch_size * Seqence_len), N = Embed_dim, K = Embed_dim.
  2. For MLP layers, N or K should further multiply with "MLP_ratio".
  3. For batch dot or multi-head attention, M = Seqence_len, N (K) = Embed_dim//Heads, K (N) = Seqence_len; batch dot size = batch_size *Heads.

Hope this will help!