ggerganov/llama.cpp

LLM inference in C/C++

C++MIT

Issues

Make -DLLAMA_HIP_UMA a dynamic setting.
#7145 opened a month ago by sebastian-philipp
0
Can't run the program
#7181 opened a month ago by mike2003
10
BF16 prompt processing has half the performance compared to F16 and F32 von AMD Ryzen Embedded V3000 (Zen 3)
#7182 opened a month ago by lemmi
1
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 149, got 147
#7157 opened a month ago by YathenStianbase
0
Support for Consistency Large Language Models?
#7168 opened a month ago by unoexperto
5
LLaVA-NeXT-Video-34B
#7201 opened a month ago by mirek190
0
[SYCL] Implement Flash attention.
#7141 opened a month ago by qnixsynapse
2
Add support for mistral Dutch and Armenian models: Tweeties/tweety-7b-dutch-v24a and Tweeties/tweety-7b-armenian-v24a
#7170 opened a month ago by JohnClaw
0
Train For Language Translation
#7178 opened a month ago by nichellehouston
2
third party applications are overwhelmingly slow for subsequent prompt evaluation compared to examples/main and examples/server
#7185 opened a month ago by khimaros
2
Server 'penalize_nl' parameter defaults to False?
#7136 opened a month ago by AayushG159
2
Is it extending pre trained model or finetuning the pretrained model?
#7137 opened a month ago by eswarthammana
0
ggml-cuda.cu:1278: to_fp32_cuda != nullptr
#7211 opened a month ago by a-downing
9
error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s8_x2’?
#7147 opened a month ago by CaptainOfHacks
3
ggml-cuda.so is 90mb with -arch=all
#7156 opened a month ago by jart
6
Native Intel IPEX-LLM Support
#7190 opened a month ago by iamhumanipromise
12
convert-hf-to-gguf-update.py breaks
#7207 opened a month ago by CrispStrobe
15
build server success but execute `ggml_cuda_init: failed to initialize CUDA: unknown error`
#7218 opened a month ago by wzhgithub
1
Looking for help for using llama.cpp with Phi3 model and LoRA
#7164 opened a month ago by SHIMURA0
6
how can i modify the setting,make it answer in Chinese by default
#7167 opened 22 days ago by LiangZeFenglzf
9
Compilation error using HIP SDK on Windows
#7242 opened 24 days ago by lastrosade
1
Vulkan outputs gibberish using extended context with vram saturated
#7240 opened a month ago by daniandtheweb
5
convert-hf-to-gguf.py breaks on phi-2
#7219 opened a month ago by CrispStrobe
11
Embedding server crashes when used with langchain openai embeddings
#7221 opened a month ago by voorhs
3
llamacpp --prompt-cache-all < -- more than a year passed and still is not fully implemented
#7179 opened a month ago by mirek190
1
Should we add an autolabeler for PR?
#7174 opened a month ago by mofosyne
2
selects too many cores by default on orange pi 5 (2x slower)
#7176 opened a month ago by calculatortamer
3
Support request - Google MADLAD400-10B
#7238 opened a month ago by nekiee13
2
Impact of bf16 on Llama 3 8B perplexity?
#7148 opened a month ago by jim-plus
2
Is Infini-attention support possible?
#7213 opened a month ago by sdmorrey
1
Gibberish response from server and main exits on M1 macstudio ultra with gpu (cpu ok)
#7159 opened a month ago by jrozentur
4
NKVO argument leads to huge compute buffers in full Cublas offload on a heterogeneous dual GPU config.
#7217 opened a month ago by Nexesenex
1
repeatability problem with CUDA backend
#7228 opened a month ago by steampunque
8
Build error at server.cpp: undefined reference to `json_schema_to_grammar
#7189 opened a month ago by jarviszeng-zjc
8
Token generation speed reduces after GPU offloading
#7244 opened a month ago by alexmjames
0
Add metadata override and also generate dynamic default filename when converting gguf
#7165 opened a month ago by mofosyne
1
Server api not functioning with frontends
#7231 opened a month ago by wooooyeahhhh
3
CMakeLists bug in BLAS
#7227 opened a month ago by hpcpony
0
An error occurred while converting Sakura-14B-Qwen2beta-v0.10pre0 to gguf
#7236 opened a month ago by lingyezhixing
0
--cache-type-k q8_0 crashes server.exe after a while
#7230 opened a month ago by DrVonSinistro
3
bf16 GGUF fails with GGML_ASSERT on CUDA
#7223 opened a month ago by ddh0
2
Assertion failure on quantization of Meta-Llama-3-70B-Instruct from f16 to various quantization types.
#7215 opened a month ago by tigran123
9
How to make the examples?
#7220 opened a month ago by Zibri
1
Abort in example server (/completions route) given string-type system_prompt
#7152 opened a month ago by justinsteven
3
Server: completion_probabilities (tok_str and prob) seem to be broken
#7197 opened a month ago by reuank
8
quantize: command not found
#7196 opened a month ago by userandpass
2
Huge difference in performance between llama.cpp and llama-cpp-python
#7208 opened a month ago by kseyhan
1
Expanding Swift Package Functionality
#7186 opened a month ago by spprichard
2
Messy CUDA graph error output on mixtral/MoE models
#7175 opened a month ago by CISC
6
[Server] JSON outputs are not being enforced according to the JSON Schema.
#7149 opened a month ago by remixer-dec
3