About MM size in paper and demo

Question

About MM size in paper and demo

sato-imo opened this issue a year ago · 1 comments

Could you please describe what is your input size of the original model, especially for ViT and BERT.
And how do you organize the matrices to form the MxK and KxN matrices as shown in Table 5.

Answer 1 · 2023-12-03T02:13:48.000Z

For BERT, we set batch_size = 6, Seqence_len = 512, Embed_dim = 1024, Heads=16, MLP_ratio=4
For ViT, we set batch_size = 48, Seqence_len = 64, Embed_dim = 1024, Heads=16, MLP_ratio=4

For QKV and Projection layer M = (batch_size * Seqence_len), N = Embed_dim, K = Embed_dim.
For MLP layers, N or K should further multiply with "MLP_ratio".
For batch dot or multi-head attention, M = Seqence_len, N (K) = Embed_dim//Heads, K (N) = Seqence_len; batch dot size = batch_size *Heads.

Hope this will help!