can't mlock because it's not supported on this system
Naugustogi opened this issue · 5 comments
Using Windows, on the other side llama.cpp works as fine with keeping the model in ram.
Loading the model each time for use is annoying.
@Naugustogi, mlock
should be supported, Do you get any errors!
Btw, loading the model does not take much time! it is almost instant now! Why it that annoying ?
@Naugustogi,
mlock
should be supported, Do you get any errors!Btw, loading the model does not take much time! it is almost instant now! Why it that annoying ?
I'm using v 1.0.6 pyllamacpp.
It crashed when i'm using use_mlock=True
also using f16_kv=1
Also
llama_print_timings: load time = 69042.31 ms
llama_print_timings: sample time = 14.22 ms / 33 runs ( 0.43 ms per run)
llama_print_timings: prompt eval time = 60306.69 ms / 108 tokens ( 558.40 ms per token)
llama_print_timings: eval time = 18046.62 ms / 32 runs ( 563.96 ms per run)
llama_print_timings: total time = 87104.24 ms
Which is way to slow i think.
Model is gpt4 x alpaca 13b.
Using 16gb ram, intel core i5 7400
works definitely faster if i'm using the base llama.cpp, i'm getting like 4 tokens/s.
@Naugustogi I think that error is coming from the ggml
library.
Everything is working normally on my side.
Could you please try to build it from source ?
@Naugustogi I think that error is coming from the
ggml
library. Everything is working normally on my side. Could you please try to build it from source ?
I am unable to rebuild and have to rely on other peoples upload. You can close this issue if you want. For now i have to wait for speed modifications, Model loading and staying in ram is fine, it just takes abit time in my case. The initial problem wasn't mlock. I simply mistook the loading time for the model generation time.
@Naugustogi Why you can't rebuild. if you succeeded to run llama.cpp
then the process is straightforward, you only need cmake
, and run pip install from the github repo!