intel-analytics/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, etc.
PythonApache-2.0
Issues
- 0
Unable to save quantized model
#11058 opened by vmadananth - 9
Docker on Windows vllm serving issue
#11029 opened by ktjylsj - 0
vLLM offline_inference.py failed to run on CPU inference
#11056 opened by eugeooi - 1
- 1
Not able to profile LLAMA2 on iGFX (windows)
#10956 opened by vmadananth - 1
all-in-one tool for chatglm3-6b: 2nd latency of batch size 1 is larger than batch size 2
#10992 opened by Fred-cell - 2
ipex-llm version 0510 has regression than 0430, especially for BS=16,32 and 8k input
#10994 opened by Fred-cell - 3
- 1
all-in-one benchmark with Baichuan2-13B OOM
#11005 opened by kevin-t-tang - 1
- 0
Weights of LlamaForCausalLM were not initialized from the model checkpoint at meta-llama/Meta-Llama-3-8B-Instruct?
#11052 opened by JamieVC - 1
- 2
MTL Windows Qwen-VL AttributeError: 'QWenAttention' object has no attribute 'position_ids'
#11006 opened by juan-OY - 0
- 0
ChatGLM run error on MTL iGPU
#11012 opened by aitss2017 - 3
- 6
failed to run piqa test with sym_int4 precison by harness
#10961 opened by aoke79 - 12
Failing to run ipex-llm ollama on Intel Arc A770
#10995 opened by dan9070 - 1
failed to run truthfulqa_mc1 by harness
#11015 opened by aoke79 - 4
- 22
Performance drop for neural-chat 7b with new repo of ipex-llm(2.5.0b20240425) vllm serving.
#10924 opened by Vasud-ha - 4
- 2
all-in-one tool for ChatGLM3-6b: next token latency with BS=16 is slower than before
#10993 opened by Fred-cell - 1
Support both Llama2 and stablelm/Zephyr-3B
#11002 opened by Quallyjiang - 6
Docker image (intelanalytics/ipex-llm-xpu): Documentation stated I would need to disable iGPU to use A770. When will you fix this issue since disabling iGPU is problematic?
#10940 opened by sungkim11 - 4
Run Qwen1.5-1.8B on MTL iGPU with llama.cpp using ipex-llm backend and get no results
#10989 opened by violet17 - 2
Can provide llama.dll build guide?
#10875 opened by KiwiHana - 2
Phi-3 model performance on MeteorLake GPU
#10947 opened by bopeng1234 - 5
[bug] LLAMA3-8B输出错误
#10974 opened by intelyoungway - 1
Main Memory continued decline with ipex-llm for local LLM inference on Intel Arc GPU.
#10949 opened by sunyijin - 4
- 4
Fastchat serving embeddings?
#10915 opened by lnguyen - 1
Crash when using llama.dll
#10952 opened by season-studio - 1
MTL 165H ubuntu22.04 can't benchmark qwen/Qwen-7B-Chat
#10936 opened by taotao1-1 - 5
Feature request: Support fp16 with self-speculative decoding on XPU in ipex_llm.serving.fastchat.ipex_llm_worker
#10905 opened by brosenfi - 7
IPEX-LLM on Intel Max Series 1100 for inference libintel-ext-pt-gpu.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev
#10941 opened by shailesh837 - 4
can not find gpu with linux system
#10935 opened by K-Alex13 - 0
stable version release requirement for arc GPU
#10950 opened by Fred-cell - 2
Speech T5 on XPU on Intel Arc GPU 770 taking 8 seconds and for CPU it takes 3 seconds ??
#10942 opened by shailesh837 - 1
- 9
unable to run inference in linux environment
#10921 opened by K-Alex13 - 2
- 2
IndexError: list index out of range when ipex_fp16_gpu test_api is used in all-in-one
#10914 opened by Kpeacef - 1
phi-3-mini support
#10913 opened by aoke79 - 1
Issue with saving and loading low bit BLIP-2 model
#10892 opened by wayfeng - 1
How to use BigDL to achieve early stop function
#10853 opened by gdg1212 - 0
Improve First Token Latency for multi-GPU projects (by flash attention or alternative)
#10897 opened by moutainriver - 2
wav2lip issue - Intel ARC on Ubuntu
#10850 opened by maxkim-kr - 3
RuntimeError: "fused_dropout" not implemented for 'Byte' when running trl ppo finetuning
#10854 opened by Jasonzzt - 1
GPU hang when switch between Llama2 and Llama3 on ARC770
#10852 opened by moutainriver