intel/neural-speed
An innovative library for efficient LLM inference via low-bit quantization
C++Apache-2.0
Issues
- 0
Once upon a time, a little NE_ASSERT: /root/w0/workspace/neuralspeed-wheel-build/nlp_repo/neural_speed/core/ne_layers.c:2651: ne_nelements(a) == ne0 * ne1 * ne2
#326 opened by zwx109473 - 17
is it supported with Batch size >1 ?
#269 opened by QuPengfei - 0
BF16 Compute DType on AVX512 ISA
#308 opened by Alavandar08 - 1
Yi-6B model failed to evaluate
#314 opened by jedcheng - 8
Bestla Kernels understanding and benchmarking
#289 opened by Alavandar08 - 1
- 0
Whats the different with IPEX-LLM?
#290 opened by manfye - 1
Performance on Xeon Scalable
#284 opened by regmibijay - 1
Add support for phi3-vision
#268 opened by bil-ash - 2
Loading checkpoint shards takes too long
#251 opened by irjawais - 2
Is tensor parallelism supported by neural speed?
#220 opened by zhangnju - 3
AssertionError: Fail to convert pytorch model
#194 opened by anthony-intel - 3
Distributing tensors across NUMA nodes
#207 opened by shg8 - 1
Feature request: JSON mode output
#204 opened by eliranwong - 2
heap-buffer-overflow while packing weight
#167 opened by yufenglee - 13
Performance Gap between Neural Speed Matmul Operator and Llama.cpp Operator
#174 opened by aciddelgado - 8
Modifying the models hyperparameters
#124 opened by benjamin27315k - 1
- 5
Error: Unable to install.
#257 opened by Ujjawal-K-Panchal - 1
source build from release tar file?
#258 opened by hpcpony - 4
Add support for phi-3-mini-128k model
#238 opened by bil-ash - 1
Sycl support ?
#191 opened by rahulunair - 4
- 16
Garbled characters with beam search
#215 opened by jiafuzha - 2
- 4
i wish for simpler way to run the model
#230 opened by kolinfluence - 1
i saw how beautiful this repo is, in terms of parallelism / numa stuff etc.
#231 opened by kolinfluence - 2
Issue in whisper inference from pre-converted gguf
#203 opened by bil-ash - 4
Question about Thread pool and GEMV
#221 opened by chenhongyu2048 - 5
- 2
Huge performance difference in "Transformer-like" usage and "llama.cpp-like" usage
#205 opened by Ankur-singh - 1
Running Q4_K_M gguf models: unrecognized tensor type 12
#206 opened by shg8 - 1
baseline example not working
#193 opened by anthony-intel - 3
Neural Speed compilation failing in ORT
#188 opened by sunnyshu-intel - 3
- 4
[Feature request] Add nllb support
#99 opened by bil-ash - 2
Can't Load Qwen after support qwen2
#161 opened by kunger97 - 7
- 7
Error loading model when use qwen gguf model
#96 opened by kunger97 - 3
- 2
Error during - pip install . -
#111 opened by dellamuradario - 2
Documentation for whisper inference
#104 opened by bil-ash - 3
error running inference
#91 opened by RachelShalom - 2
Can't inference Llama2 through GGUF
#88 opened by ZJkyle - 1
is qwen been supported?
#77 opened by kunger97 - 3
Build failure when building the executable
#74 opened by aahouzi - 1
AVX_VNNI Numeric Bug?
#32 opened by parvizmp