intel-analytics/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
PythonApache-2.0
Issues
- 23
- 10
SYCL error: CHECK_TRY_ERROR(dpct::gemm_batch( *g_sycl_handles[g_main_device], oneapi::mkl::transpose::trans
#10845 opened - 3
Update Ollama Version to 0.1.32
#10837 opened - 2
- 1
How to use BigDL to calculate MSE values
#10831 opened - 4
- 11
- 2
ollama no GPU - Intel Arc A750 Windows and Linux
#10823 opened - 26
[Feature Request] IPEX-LLM + Axolotl Docker Image
#10821 opened - 4
Llama-CPP Install Issue - Windows
#10820 opened - 1
- 0
chatglm3-6b with fp8, 1k input, 512 output, and batch 64 failed by all-in-one benchmark tool
#10818 opened - 2
llama 3 can not load_low_bit
#10816 opened - 1
llama 3 can not stop
#10815 opened - 3
How does llama cpp backend work?
#10803 opened - 3
- 1
- 4
- 1
- 1
streamlit iGPU -RuntimeError: Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
#10778 opened - 1
deepspeed_optimize_model_gpu Qwen/Qwen-7B-Chat
#10763 opened - 1
MiniGPT4-Video support
#10759 opened - 1
Feature Request: RoSA and QRoSA
#10755 opened - 3
GPTQ inference issue on intel gpu
#10754 opened - 2
- 7
- 1
- 2
- 2
- 6
Saving of low-bit models for later loading?
#10729 opened - 3
- 0
- 9
Llama-2-7b-chat-hf produces wrong output on CPU
#10724 opened - 1
- 3
- 2
The current vLLM docker image can't support the Qwen 1.5 7B model because of transformer version
#10684 opened - 4
Strange CPU performance curve when I use Chatglm3 to infer sentence inputs with tens of thousands of tokens
#10683 opened - 1
starcoder2 optimization results
#10680 opened - 1
Benchmarking Chatglm3-6B on Xeon SPR: Forward() expected at most 5 arguments but received 6
#10674 opened - 7
- 0
Mistral hack in vLLM no longer needed
#10667 opened - 4
Self Speculative Decoding at lower precisions?
#10666 opened - 1
TypeError:invalidInputError() missing 1 required positional argument: 'errMsg' with vllm-serving.
#10661 opened - 12
- 1
- 3
- 7
- 1
- 0
starcoder2-3B model for reset token latency
#10607 opened - 9