can't run the starcoder-ggml.bin
breakenknife opened this issue · 2 comments
breakenknife commented
./main -m models/bigcode/starcoder-ggml.bin -p "def fibonnaci(" --top_k 0 --top_p 0.95 --temp 0.2
main: seed = 1685609262
starcoder_model_load: loading model from 'models/bigcode/starcoder-ggml.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx = 8192
starcoder_model_load: n_embd = 6144
starcoder_model_load: n_head = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype = 1
starcoder_model_load: qntvr = 0
starcoder_model_load: ggml ctx size = 51276.47 MB
GGML_ASSERT: ggml.c:3874: ctx->mem_buffer != NULL
Aborted (core dumped)
hi, My machine has 38GB of memory and can execute starcoder starcoder-ggml-q4_1.bin, but cannot execute non quantified starcoder-ggml.bin. Is this because there is not enough memory?
ChaoticByte commented
When running inference, the whole model is loaded into memory. If your starcoder-ggml.bin file is larger than your memory, then: yes :)
NouamaneTazi commented
Also some devices allow offloading some memory to disk (warning: inference will be slower). So you can always try running a model, and if that fails then know that the lack of memory is the issue