skeskinen/bert.cpp

Issue reading ggml-model-q4_0.bin

thijse opened this issue · 4 comments

thijse commented

First of all, thanks for your work! I have trouble loading the weights in 'ggml-model-q4_0.bin' model (windows, Visual Studio 2022 )

In the second loop of

    while (true)
    {
        int32_t n_dims;
        int32_t length;
        int32_t ftype;

        fin.read(reinterpret_cast<char *>(&n_dims), sizeof(n_dims));
        fin.read(reinterpret_cast<char *>(&length), sizeof(length));
        fin.read(reinterpret_cast<char *>(&ftype), sizeof(ftype));

all 3 parameter give nonsense values.

Also if, I look at the output on my machine

bert_load_from_file: loading model from 'D:/GitHub/LLM/bert.cpp/models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens   = 512
bert_load_from_file: n_embd  = 384
bert_load_from_file: n_intermediate  = 1536
bert_load_from_file: n_head  = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16     = 2
bert_load_from_file: ggml ctx size =  12.26 MB
bert_load_from_file:

ggml ctx size is not the same as in your build example

Did the data model change? I would be gratefull if you could help me debug this

Hi,

I have not tested this myself on Windows as I don't have Windows dev environment.

Here is how it looks like for me

~/bert.cpp/build$ du ../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin 
14196	../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin
~/bert.cpp/build$ md5sum ../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin 
e2476234b52c82fe31031528f3306c9b  ../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin

~/bert.cpp/build$ ./bin/server -m ../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin 
bert_load_from_file: loading model from '../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens   = 512
bert_load_from_file: n_embd  = 384
bert_load_from_file: n_intermediate  = 1536
bert_load_from_file: n_head  = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16     = 2
bert_load_from_file: ggml ctx size =  13.57 MB
bert_load_from_file: ............ done
bert_load_from_file: model size =    13.55 MB / num tensors = 101
bert_load_from_file: mem_per_token 452 KB, mem_per_input 248 MB
Server running on port 8080 with 6 threads
Waiting for a client

Also can you check the ggml version?

~/bert.cpp$ git submodule
 1a5d5f331de1d3c7ace40d86fe2373021a42f9ce ggml (heads/master-85-g1a5d5f3)

My hunch is that either:

  1. string stuff at bert:590 is platform specific and breaks down on windows

  2. it's trying to load wrong amount of bytes when reading the tensor data.
    I.e.
    fin.read(reinterpret_cast<char *>(tensor->data), ggml_nbytes(tensor));
    on line bert.cpp:660, ggml_nbytes returns different number for some platform reason.

thijse commented

Hi,

Thanks for your help! The file seems in order:

D:\GitHub\LLM\bert.cpp\models\all-MiniLM-L6-v2>md5sum.exe ggml-model-q4_0.bin
e2476234b52c82fe31031528f3306c9b *ggml-model-q4_0.bin

and

D:\GitHub\LLM\bert.cpp>git submodule
 1a5d5f331de1d3c7ace40d86fe2373021a42f9ce ggml (1a5d5f3)

The string stuff at 590: at least loads a seemingly correct string: embeddings.word_embeddings.weight of 33 characters.

ggml_nbytes(tensor) returns 6592752, I'm not sure if it correcf, but it is at least divisible by 8.
fin.read(reinterpret_cast<char *>(tensor->data), ggml_nbytes(tensor)); This is indeed the last read before the loop and afterwards things go boom, so my guess is that this should be the culprit. Is there any way I can check the integrity of the tensor?

I think I found the issue on my end.
Can you re-download the quantized model and try again?

Thanks!

thijse commented

Yes! I need some free time to test the network, but the model is now loading fine! Thanks a lot!