intel-analytics/ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

PythonApache-2.0

Issues

Running 2 x A770 with Ollama, inference responses slow down dramatically
#10847 opened a month ago
23
SYCL error: CHECK_TRY_ERROR(dpct::gemm_batch( *g_sycl_handles[g_main_device], oneapi::mkl::transpose::trans
#10845 opened 2 months ago
10
Update Ollama Version to 0.1.32
#10837 opened 2 months ago
3
qwen 1.5 error with intelanalytics/ipex-llm-xpu:2.1.0-SNAPSHOT
#10834 opened 2 months ago
2
How to use BigDL to calculate MSE values
#10831 opened 2 months ago
1
[MTL][Qwen] model fail RuntimeError: "normal_kernel_cpu" not implemented for 'Byte'
#10826 opened a month ago
4
[vLlm-serving] When run "VLLM_BUILD_XPU_OPS=1 pip install --no-build-isolation -v -e . ", it reports can't find dpct.hpp
#10825 opened a month ago
11
ollama no GPU - Intel Arc A750 Windows and Linux
#10823 opened 2 months ago
2
[Feature Request] IPEX-LLM + Axolotl Docker Image
#10821 opened 13 days ago
26
Llama-CPP Install Issue - Windows
#10820 opened 2 months ago
4
Unable to run llama_cpp example from quickstart guide (PI_ERROR_BUILD_PROGRAM_FAILURE)
#10819 opened 2 months ago
1
chatglm3-6b with fp8, 1k input, 512 output, and batch 64 failed by all-in-one benchmark tool
#10818 opened 2 months ago
0
llama 3 can not load_low_bit
#10816 opened 2 months ago
2
llama 3 can not stop
#10815 opened 2 months ago
1
How does llama cpp backend work?
#10803 opened 2 months ago
3
Flex 170 GPU ollama unbale to detect GPU and sycl-ls also not detecting it !!!
#10801 opened 2 months ago
3
Ipex-llm older ollama serve hangs after 5 minutes Intel Arc GPU 770
#10800 opened 2 months ago
1
Exception while running ollama, caught ggml-sycl.cpp, line:17037, func:operator()
#10797 opened 2 months ago
4
Model output is different when using default optimize_model
#10782 opened 2 months ago
1
streamlit iGPU -RuntimeError: Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
#10778 opened 2 months ago
1
deepspeed_optimize_model_gpu Qwen/Qwen-7B-Chat
#10763 opened 2 months ago
1
MiniGPT4-Video support
#10759 opened 2 months ago
1
Feature Request: RoSA and QRoSA
#10755 opened 2 months ago
1
GPTQ inference issue on intel gpu
#10754 opened 2 months ago
3
Batch size 16, GPU utilization 100%, Request 100% error
#10751 opened 2 months ago
2
Not able to run with llama.cpp on MTL platform with various errors
#10745 opened 2 months ago
7
Llava-v1.5-7b ValueError: Unrecognized configuration class <class 'transformers.models.llava.configuration_llava.LlavaConfig'> for this kind of AutoModel: AutoModelForCausalLM.
#10744 opened 2 months ago
1
First python import torch&ipex take a long time on Ultra 7 155H
#10741 opened 2 months ago
2
GGML_ASSERT error using IPEX-LLM llama.cpp in Solar 10.7B model
#10731 opened 2 months ago
2
Saving of low-bit models for later loading?
#10729 opened 2 months ago
6
langchain-chatchat/IPEX_LLM failed to work on windows A770
#10728 opened 2 months ago
3
LLVM ERROR: VISA builder API call failed on UHD igpu when 1st token <= 64
#10727 opened 2 months ago
0
Llama-2-7b-chat-hf produces wrong output on CPU
#10724 opened 2 months ago
9
Run llama2 on windows A750 failed: No module named 'linear_fp16_esimd'
#10698 opened 2 months ago
1
provide support for model serving using FastAPI deepspeed+ipex-llm
#10690 opened 2 months ago
3
The current vLLM docker image can't support the Qwen 1.5 7B model because of transformer version
#10684 opened 2 months ago
2
Strange CPU performance curve when I use Chatglm3 to infer sentence inputs with tens of thousands of tokens
#10683 opened 2 months ago
4
starcoder2 optimization results
#10680 opened 2 months ago
1
Benchmarking Chatglm3-6B on Xeon SPR: Forward() expected at most 5 arguments but received 6
#10674 opened 2 months ago
1
Heavy CPU bottleneck when working with Intel ARC A770 16GB GPU Inference
#10668 opened 2 months ago
7
Mistral hack in vLLM no longer needed
#10667 opened 2 months ago
0
Self Speculative Decoding at lower precisions?
#10666 opened 2 months ago
4
TypeError:invalidInputError() missing 1 required positional argument: 'errMsg' with vllm-serving.
#10661 opened 2 months ago
1
Flex 170 x8 is failing when targeting 6 or 8 GPUs--resolved
#10658 opened a month ago
12
[Langchain-Chatchat]Add time consumption msg about first token and rest tokens
#10628 opened 2 months ago
1
there is no output for inference of Qwen-7B-chat with FP8 weight-only based all-in-one
#10622 opened 2 months ago
3
ModuleNotFoundError: No module named 'transformers_modules.Qwen-7B-Chat-Int4'
#10616 opened 2 months ago
7
Converting mistralai/Mistral-7B-Instruct-v0.2 to lower 4 bit running into error
#10613 opened 2 months ago
1
starcoder2-3B model for reset token latency
#10607 opened 2 months ago
0
LangChain-Chatchat shows RuntimeError: could not create a primitive
#10605 opened 2 months ago
9