Pinned issues
Issues
- 41
Misc. bug: The KV cache is sometimes truncated incorrectly when making v1/chat/completions API calls
#11970 opened by vnicolici - 3
Misc. bug: convert_hf_to_gguf failed
#11991 opened by JSXGQ - 9
Regression. Unable to run any model. CRASH!!!
#12075 opened by acbits - 6
- 1
Worse performance on newer Linux kernel version
#12086 opened by cxxr2 - 3
- 0
Compile bug: How to build llama.android example with -DGGML_VULKAN=ON through android studio.
#12085 opened by gaykawadpk - 1
- 3
Eval bug: MUSA error: operation not supported
#12077 opened by yeungtuzi - 0
Eval bug: getting assertion error when trying to use a gguf quantized model at inference "GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed"
#12080 opened by Vedapani0402 - 6
Misc. bug: Concurrency Limitation: Only 6 Inferences Run Simultaneously When Setting `--parallel` > 6
#12013 opened by karanotsingyu - 5
Eval bug: TikTokenTokenizer has no attribute vocab
#12044 opened by zhanghui-china - 15
Eval bug: Several models producing gibberish
#12012 opened by iamangus - 2
Eval bug: granite-vision-3.1-2b-preview ERROR:hf-to-gguf:Model LlavaNextForConditionalGeneration is not supported
#12053 opened by gnusupport - 5
- 0
Eval bug: llama.cpp:8910: GGML_ASSERT(strcmp(embd->name, "result_norm") == 0 && "missing result_output tensor") failed
#12074 opened by 79154gb - 2
Eval bug: std::filesystem::__cxx11::filesystem_error
#11962 opened by gnusupport - 6
Eval bug: CANNOT LINK EXECUTABLE "./llama-cli": library "libomp.so" not found: needed by main executable
#11979 opened by Krallbe68 - 1
Misc. bug: ggml-backend.cpp:746: pre-allocated tensor (cache_k_l0 (view) (copy of cache_k_l0 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY)
#12045 opened by simonchen - 2
Eval bug: GGML_ASSERT(hparams.n_embd_head_k % ggml_blck_size(type_k) == 0) failed
#12033 opened by AbdullahMPrograms - 6
- 0
Misc. bug: cannot scroll to right side when input too long
#12054 opened by gnusupport - 2
- 1
Misc. bug: --no-context-shift OR --context-shift ?
#12038 opened by simonchen - 0
Compile bug: llama.cpp-b4749/ggml/src/ggml-cpu/ggml-cpu-quants.c:5141:26: error: initialization of ‘uint32_t *’ {aka ‘unsigned int *’} from incompatible pointer type ‘const uint8_t (*)[12]’ {aka ‘const unsigned char (*)[12]’} [-Wincompatible-pointer-types]
#12050 opened by Arniiiii - 3
Eval bug: context shift is disabled
#11974 opened by deific - 2
Eval bug: CPU usage is abnormal when running deepseek-r1-671B-Q4_0 weights in Atlas 800T a2 and NPU device。
#11966 opened by woshidahunzi1 - 2
llama-cli misbehaving (changed?)
#12036 opened by 0wwafa - 4
Feature Request: 推理minicpmv时,encoding_image_with_clip耗时很久
#11941 opened by EnzhiZhou - 2
- 1
Misc. bug: add tool_calls id in response in server
#11992 opened by henryclw - 4
Eval bug: unknown pre-tokenizer type: 'deepseek-r1-qwen'
#12021 opened by wr131 - 8
Misc. bug: llama-run segmentation fault
#12022 opened by benoitf - 0
[Feature]: SOC_VERSION ascend310b1 does not support
#11978 opened by Cikaros - 2
Add option to build CUDA backend without Flash attention
#11946 opened by slaren - 3
[CANN] Compile bug: no matching function for call to 'CastIntrinsicsImpl' Ascend NPU issues specific to Ascend NPUs
#12010 opened by Cikaros - 0
GGML to GGUF FAIL Quantized tensor bytes per row (5120) is not a multiple of Q2_K type size (84)
#11976 opened by chokoon123 - 2
Maybe it would better to have a diagram to show how llama.cpp process inferences
#11967 opened by yinuu - 1
Misc. bug: `json_schema` under `response_format` is not working on OpenAI compatible API endpoint `v1/chat/completions`
#11988 opened by henryclw - 0
Feature Request: add Kernel level verbose option
#11985 opened by 0400H - 1
Eval bug: Unexpected empty grammar stack after accepting piece: <|tool_calls_begin|> on DeepSeek-R1-Distill-Qwen-32B
#11938 opened by chgjin - 0
- 1
Misc. bug: Sporadic MUL_MAT Failures in test-backend-ops for Nvidia backend
#11972 opened by ShanoToni - 0
- 0
- 0
- 0
Eval bug: llama.cpp Incorrectly Parses and Reports sprintf Calls in C++ Code
#11951 opened by perdubug - 0
Misc. bug: hipGraph causes a crash in hipGraphDestroy
#11949 opened by IMbackK - 0
- 2
Feature Request: dynamic speculation (i.e. dynamic draft-max)
#11933 opened by fredlas