Pinned issues
Issues
- 3
server chat/completion api fails - coroutine object not callable in llama_proxy
#1857 opened by PurnaChandraPanda - 1
Add an option to enable --runtime-repack in llama.cpp
#1860 opened by ekcrisp - 1
- 3
- 0
- 0
- 1
server: chat completions returns wrong logprobs model
#1787 opened by domdomegg - 1
- 3
Release the GIL
#1836 opened by simonw - 5
Add reranking support
#1794 opened by donguyen32 - 2
- 0
Intel GPU not enabled when using -DLLAVA_BUILD=OFF
#1851 opened by dnoliver - 0
- 0
With Intel GPU on Windows, llama_perf_context_print reports invalid performance metrics
#1853 opened by dnoliver - 1
- 2
Error when building wheels on Linux FileNotFoundError: [Errno 2] No such file or directory: 'ninja'`
#1839 opened by sabaimran - 2
Mistral-instruct not using system prompt.
#1832 opened by AkiraRy - 3
- 0
[INFORMATION REQUEST] Is it possible to build for GPU enabled target on non-GPU host?
#1841 opened by m-o-leary - 0
- 7
Request for prebuilt CUDA wheels for newer version
#1824 opened by XJF2332 - 0
Error when update to 0.3.2
#1837 opened by paoloski97 - 2
Add support of Qwen2vl
#1811 opened by PredyDaddy - 1
Update related llama.cpp to support Intel AMX instruction
#1827 opened by nai-kon - 1
Setting seed to -1 (random) or using default LLAMA_DEFAULT_SEED generates a deterministic reply chain
#1809 opened by m-from-space - 1
updated from 0.2.90 to 0.3.2 and now my GPU won't load
#1835 opened by rookiemann - 1
save logits section in eval() sets dtype to np32 apparently unconditionally?
#1829 opened by robbiemu - 3
Installed everything but speed is lower than on 3090 compared with industrial GPU. Seems like cuda is not working.
#1815 opened by lukaLLM - 15
llama-cpp-python 0.3.1 didn't use GPU(
#1785 opened by artyomboyko - 0
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='nul' mode='w' encoding='cp932'>
#1828 opened by AkiraRy - 0
llama-server not using GPU
#1826 opened by RakshitAralimatti - 4
Specify GPU Selection (e.g., CUDA:0, CUDA:1)
#1816 opened by RakshitAralimatti - 2
Prebuilt CUDA wheels not working
#1822 opened by mjwweb - 3
llama-cpp-python not using GPU on google colab
#1780 opened by AnirudhJM24 - 2
AttributeError: function 'llama_sampler_init_tail_free' not found after compiling llama.pcc with hipBLAS
#1818 opened by Micromanner - 2
Unable to Use GPU with llama-cpp-python on Jetson Orin
#1779 opened by watchstep - 2
top_p = 1 causes deterministic outputs
#1797 opened by oobabooga - 0
Unable to pip install
#1804 opened by chinthasaicharan - 0
low level examples broken after [feat: Update sampling API for llama.cpp (#1742)]
#1803 opened by mite51 - 0
- 0
Long Context Generation Crashes Google Colab Instance
#1792 opened by kazunator - 1
Can't install with Vulkan support in Ubuntu 24.04
#1789 opened by wannaphong - 0
- 0
- 1
- 0
_logger.py: KeyError:5 [bugfix] [patch]
#1778 opened by themanyone - 2
Speculative decoding gives weird results in v. 0.3
#1770 opened by mobeetle - 2
- 0
Missing async llm call
#1774 opened by ivanstepanovftw - 0
[FEAT]: TLS Certificate Support
#1768 opened by isgallagher