About the calculation of overhead.
znsoftm opened this issue · 4 comments
znsoftm commented
znsoftm commented
or BERT mode, its overhead is calculated as :
model_mem_req += (5 + 16 * n_layer) * 256; // object overhead
Can anyone explain the meaning 5 is extra tensors, 16 means each layer has 16 tensor, and 256 for what?
Is it the sizeof ggml_tensor struct ? The actual size is 208 bytes, so 256 is rounded size?
skeskinen commented
My memory is a little hazy on this subject.
Like you said 5 should be the extra model wise tensors not tied to any layer. I think I tried smaller number than 256 for the size but it crashed with OOM.
Probably the real size of C structs is always rounded up to the next power of 2?
znsoftm commented
thanks for your answer:)
znsoftm commented
I have tested the latest ggml, should alter the 256 to 512. Do not understand why:(