intel-analytics/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
PythonApache-2.0
Issues
- 4
- 5
Error: llama runner process has terminated: error loading model: No device of requested type available
#12420 opened by fanlessfan - 8
update to ollama 0.4.0
#12370 opened by Matthww - 1
'AutoModel' object has no attribute 'config' when using Speech_Paraformer-Large on NPU
#12412 opened by fanyhchn - 1
Update Ollama with IPEX-LLM to a newer version
#12411 opened by NikosDi - 33
Brave Leo AI using Ollama and Intel GPU
#12248 opened by NikosDi - 4
Path of models using Ollama with IPEX-LLM (Windows)
#12403 opened by NikosDi - 2
- 6
using both iGPU and CPU together
#12373 opened by fanlessfan - 12
Container cannot see Arc GPU
#12372 opened by robertvazan - 3
Llama-3.2 11B Vision not working with latest IPEX-LLM (vLLM version 0.6.2)
#12391 opened by HumerousGorgon - 4
assert error use ipex pytorch
#12385 opened by piDack - 6
- 11
Could not use SFT Trainer in qlora_finetuning.py
#12356 opened by shungyantham - 4
ollama run minicpm-v, runs on cpu
#12257 opened by juan-OY - 2
Docker - llama.cpp scripts / init-llama-cpp
#12379 opened by easyfab - 5
- 8
cant run ollama in docker container with iGPU in linux
#12363 opened by user7z - 1
performance problem about internvl image embedding using ggml.dll
#12376 opened by cjsdurj - 3
ipex-llm-cpp-xpu container
#12364 opened by user7z - 1
ValueError: If `eos_token_id` is defined, make sure that `pad_token_id` is defined
#12371 opened by fanlessfan - 3
Ollama run embedding module mxbai-embed-large failed.
#12348 opened by feiyu11859661 - 1
How to check GPU memory consumption by ipex on Linux?
#12315 opened by acane77 - 1
A770运行 ipex_llm harness 跑chatglm3-6b 出现Error Message: property 'pad_token' of 'ChatGLMTokenizer' object has no setter
#12335 opened by tao-ov - 0
Doubts about ParallelTable and ParallelCriterion
#12278 opened by clare-cn - 2
ipex-llm xpu version doesn't work on Lunar Lake
#12268 opened by HoppeDeng - 1
IPEX-LLM load qwen2.5 7B model failed
#12273 opened by HoppeDeng - 4
llama.cpp crashes running k-quants with Intel Arc 140V Xe2 iGPU
#12318 opened by lhl - 1
ipex-llm-ollama-installer-20240918.exe安装后用另一个exe调用文件夹中的start.bat会提示缺少dll等无法运行
#12334 opened by dayskk - 1
- 1
[ipex-llm] A significant deviation in accuracy between ipex llm 2.2.0b1 and 2.1.0b20240515 when running the codegeex model
#12294 opened by johnysh - 4
A770 run harness.RuntimeError: unsupported dtype, only fp32 and fp16 are supported
#12304 opened by tao-ov - 6
run harness on A770 error
#12290 opened by tao-ov - 3
Questions about performance gap between benchmark scripts and llama-bench from ipex-llm[cpp]
#12280 opened by acane77 - 36
Slow text generation on dual Arc A770's w/ vLLM
#12190 opened by HumerousGorgon - 5
Can't run llama model on Intel GPU on Linux platforms
#12222 opened by acane77 - 0
[NPU] Slow Token Generation with Latest NPU Driver 32.0.100.3053 on LNL 226V series
#12266 opened by climh - 1
Llamacpp generation incoherent (always <eos>). Driver version on ubuntu 22.04.5?
#12258 opened by ultoris - 0
- 4
- 6
- 0
- 1
LLM benchmark for chatglm3-6b Unable to run properly
#12208 opened by vincent-wsz - 1
- 5
Inference hangs on LNL iGPU with large input prompts.
#12158 opened by aahouzi - 2
Qwen2 Deployment by Ollama fail
#12210 opened by vincent-wsz - 2
Output format improvement
#12205 opened by HoppeDeng - 4
Unsupported SPIR-V version
#12168 opened by Pablou2902 - 4
- 2
Ollama fails to load model to a380 GPU
#12172 opened by GamerSocke