About MM size in paper and demo
sato-imo opened this issue · 1 comments
sato-imo commented
JinmingZhuang commented
-
For BERT, we set batch_size = 6, Seqence_len = 512, Embed_dim = 1024, Heads=16, MLP_ratio=4
-
For ViT, we set batch_size = 48, Seqence_len = 64, Embed_dim = 1024, Heads=16, MLP_ratio=4
- For QKV and Projection layer M = (batch_size * Seqence_len), N = Embed_dim, K = Embed_dim.
- For MLP layers, N or K should further multiply with "MLP_ratio".
- For batch dot or multi-head attention, M = Seqence_len, N (K) = Embed_dim//Heads, K (N) = Seqence_len; batch dot size = batch_size *Heads.
Hope this will help!