intel/neural-speed

Running Q4_K_M gguf models: unrecognized tensor type 12

shg8 opened this issue · 1 comments

shg8 commented
Welcome to use the llama on the ITREX! 
AVX:1 AVX2:1 AVX512F:0 AVX_VNNI:1 AVX512_VNNI:0 AMX_INT8:0 AMX_BF16:0 AVX512_BF16:0 AVX512_FP16:0
Loading the bin file with GGUF format...
main: seed  = 1712361979
model.cpp: loading model from /models/llama-2-7b.Q4_K_S.gguf
error loading model: unrecognized tensor type 12

model_init_from_file: failed to load model

I got this error when trying to load the Q4_K_M and Q4_K_S quantized models for Llama-2-7B-GGUF. Would appreciate support could be added.

@shg8 Thanks for using the Neural Speed.

We don't support Qx_K_M and Qx_K_S currently. Sry about that. We will discuss and evalute this task.

Thanks again.