nomic-ai/pygpt4all

invalid model file (bad magic [got 0x67676d66 want 0x67676a74])

qaiwiz opened this issue · 13 comments

qaiwiz commented

I am working on linux debian 11, and after pip install and downloading a most recent mode: gpt4all-lora-quantized-ggml.bin I have tried to test the example but I get the following error:

./gpt4all-lora-quantized-ggml.bin: invalid model file (bad magic [got 0x67676d66 want 0x67676a74])
you most likely need to regenerate your ggml files
the benefit is you'll get 10-100x faster load times
see ggerganov/llama.cpp#91
use convert-pth-to-ggml.py to regenerate from original pth
use migrate-ggml-2023-03-30-pr613.py if you deleted originals
llama_init_from_file: failed to load model

I tried this: pyllamacpp-convert-gpt4all ./gpt4all-lora-quantized-ggml.bin ./llama_tokenizer ./gpt4all-converted.bin but I am not sure where the tokenizer is stored!

@qaiwiz you should download the tokenizer as well (it's a small file), please see #5

qaiwiz commented

@abdeladim-s thanks, I just came to post that one has to download tokenizer as you pointed out (#5). I actually did, but then I get
File "/root/env39/bin/pyllamacpp-convert-gpt4all", line 8, in
sys.exit(main())
File "/root/env39/lib/python3.9/site-packages/pyllamacpp/scripts/convert_gpt4all.py", line 19, in main
convert_one_file(args.gpt4all_model, tokenizer)
File "/root/env39/lib/python3.9/site-packages/pyllamacpp/scripts/convert.py", line 92, in convert_one_file
write_header(f_out, read_header(f_in))
File "/root/env39/lib/python3.9/site-packages/pyllamacpp/scripts/convert.py", line 34, in write_header
raise Exception('Invalid file magic. Must be an old style ggml file.')

What does it mean by old file! Actually downloaded the most recent model.bin file from that link ([gpt4all-lora-quantized-ggml.bin] 05-Apr-2023 13:07 4G). Now, I am wondering how should I fix this to get the model working.

qaiwiz commented

I couldn't fix it to work, so I redownload the converted model: https://huggingface.co/LLukas22/gpt4all-lora-quantized-ggjt. I am trying this on my server with 2 core and 8GB of ram (I know it is the limit), and i tried to bring down temperature and ease up some of the parameter, yet it is stalling! Typically how fast should I expect this to run on such server?

#Load the model
model = Model(ggml_model="ggjt-model.bin", n_ctx=2000)

#Generate
prompt="User: How are you doing?\nBot:"

result=model.generate(prompt,n_predict=50,temp=0, top_k = 3, top_p = 0.950000 ,repeat_last_n = 64, repeat_penalty = 1.100000)

is there any hyperparameter to fix it to work faster?

@qaiwiz the spec you are using is very low, you should have a quad core CPU at least.
Also if the CPU you are using does not have AVX acceleration, it will be worse.
You won't get much speed even if you changed the hyper-parameters.

qaiwiz commented

Here is the system config:
system_info: n_threads = 2 / 2 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.000000, top_k = 3, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 2000, n_batch = 8, n_predict = 50, n_keep = 0

qaiwiz commented

Here is the output:

llama_print_timings: load time = 71340.45 ms
llama_print_timings: sample time = 299.64 ms / 55 runs ( 5.45 ms per run)
llama_print_timings: prompt eval time = 292639.93 ms / 36 tokens ( 8128.89 ms per token)
llama_print_timings: eval time = 2361021.55 ms / 52 runs (45404.26 ms per run)
llama_print_timings: total time = 2812682.00 ms

result
' User: How are you doing?\nBot:\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01'

qaiwiz commented

I couldn't fix it to work, so I redownload the converted model: https://huggingface.co/LLukas22/gpt4all-lora-quantized-ggjt.

guys you borked the lama again?

Checking discussions database...
llama_model_load: loading model from './models/gpt4all-lora-quantized-ggml.bin' - please wait ...
./models/gpt4all-lora-quantized-ggml.bin: invalid model file (bad magic [got 0x6e756f46 want 0x67676a74])
        you most likely need to regenerate your ggml files
        the benefit is you'll get 10-100x faster load times
        see https://github.com/ggerganov/llama.cpp/issues/91
        use convert-pth-to-ggml.py to regenerate from original pth
        use migrate-ggml-2023-03-30-pr613.py if you deleted originals
llama_init_from_file: failed to load model
Chatbot created successfully
 * Serving Flask app 'GPT4All-WebUI'

Was working until i did a git pull today. So, whats going on? How do you convert to the right magic?, We (GPT4ALL-UI) just recently converted all models and uploaded to the hf but now they are dead...

Issue: ParisNeo/lollms-webui#96

@andzejsp am facing the same issue as well :/ , just tried it now with latest model and it doesn't work

@andzejsp am facing the same issue as well :/ , just tried it now with latest model and it doesn't work

in my case its working with ggml-vicuna-13b-4bit-rev1.bin model, not sure why the other model died...

@andzejsp can you give me a download link to it if you have so i can try it ?

@andzejsp can you give me a download link to it if you have so i can try it ?

https://github.com/nomic-ai/gpt4all-ui#supported-models

@andzejsp We didn't touch anything, we didn't push any updates since a week now. You can take a look at the commits history.
Please make sure you are doing the right thing!!