nomic-ai/pygpt4all

can't mlock because it's not supported on this system

Naugustogi opened this issue · 5 comments

Using Windows, on the other side llama.cpp works as fine with keeping the model in ram.
Loading the model each time for use is annoying.

@Naugustogi, mlock should be supported, Do you get any errors!

Btw, loading the model does not take much time! it is almost instant now! Why it that annoying ?

@Naugustogi, mlock should be supported, Do you get any errors!

Btw, loading the model does not take much time! it is almost instant now! Why it that annoying ?
4566

5677

I'm using v 1.0.6 pyllamacpp.
It crashed when i'm using use_mlock=True
also using f16_kv=1

Also
llama_print_timings: load time = 69042.31 ms
llama_print_timings: sample time = 14.22 ms / 33 runs ( 0.43 ms per run)
llama_print_timings: prompt eval time = 60306.69 ms / 108 tokens ( 558.40 ms per token)
llama_print_timings: eval time = 18046.62 ms / 32 runs ( 563.96 ms per run)
llama_print_timings: total time = 87104.24 ms

Which is way to slow i think.
Model is gpt4 x alpaca 13b.
Using 16gb ram, intel core i5 7400

works definitely faster if i'm using the base llama.cpp, i'm getting like 4 tokens/s.

@Naugustogi I think that error is coming from the ggml library.
Everything is working normally on my side.
Could you please try to build it from source ?

@Naugustogi I think that error is coming from the ggml library. Everything is working normally on my side. Could you please try to build it from source ?

I am unable to rebuild and have to rely on other peoples upload. You can close this issue if you want. For now i have to wait for speed modifications, Model loading and staying in ram is fine, it just takes abit time in my case. The initial problem wasn't mlock. I simply mistook the loading time for the model generation time.

@Naugustogi Why you can't rebuild. if you succeeded to run llama.cpp then the process is straightforward, you only need cmake, and run pip install from the github repo!