ggerganov/llama.cpp

LLM inference in C/C++

C++MIT

Issues

CI Docker Build Issue (intel public key needs updating?)
#7507 opened 7 days ago
15
Support Official llama.cpp docker/images.
#7506 opened 15 days ago
9
convert-hf-to-gguf.py Qwen1.5-4B-Chat-GPTQ-Int4 error
#7505 opened 18 days ago
2
WizardLM-2-7B get's cut off while writing.
#7501 opened 18 days ago
1
Trying to understand token sampling better
#7498 opened 18 days ago
2
我在Android上跑，cpu，非常的慢，跑一张图片需要花费5分钟，请问这个怎么优化？
#7494 opened 14 days ago
2
Click wrong
#7493 opened 19 days ago
0
CUDA graphs break quantized K cache
#7492 opened 14 days ago
5
Error during compilation(make/cmake)
#7490 opened 19 days ago
0
Completion of error handling
#7489 opened 19 days ago
4
Problem with converting Mistral-7B-v0.3 and Mistral-7B-Instruct-v0.3 to GGUF
#7486 opened 19 days ago
5
tokenization: double EOS tokens
#7484 opened 18 days ago
2
llama build b2970 - Phi-3-medium-128k-instruct - error
#7478 opened 18 days ago
4
Old GGUF have broken tokenization and there is no warning
#7476 opened 19 days ago
9
ggml-backend should avoid memset'ing brand new memory
#7474 opened 18 days ago
2
Server (frontend): lines in code blocks get invalidated and replaced with "undefined" when contained within double braces.
#7466 opened 20 days ago
0
Is it possible to create a DLL for main.cpp?
#7460 opened 20 days ago
1
Understanding "wrong number of tensors" error when loading Llama3 model.
#7457 opened 8 days ago
3
Optimisation of per-token CPU activities for GPU inference
#7456 opened 20 days ago
3
How to convert Microsoft/trocr to ggml format
#7453 opened 20 days ago
0
regression: output is nonsense with latest commit and CUDA support enabled
#7451 opened 20 days ago
7
Since last update Mistral models doesn't works anymore
#7450 opened 20 days ago
1
Build fails with `ggml-vulkan.cpp:6880:80: error: cannot convert ‘ggml_tensor*’ to ‘float’`
#7446 opened 20 days ago
4
FR: Phi-3-vision-128k-instruct implementation
#7444 opened 20 days ago
11
Performance Regression Observed in llama.cpp
#7443 opened 20 days ago
3
Problems about Hugging Face model to the gguf format.
#7440 opened 16 days ago
4
Phi 3 medium/small support
#7439 opened 21 days ago
31
Still not working with Meta-Llama-3-8B-Instruct
#7437 opened 20 days ago
1
[server] phi-3 uses <|endoftext|> instead of <|end|> when applying chat template in /chat/completions
#7432 opened 19 days ago
1
Potential string overflow in stbi_parse_png_file function
#7431 opened 21 days ago
0
Why is convert-lora-to-ggml.py removed?
#7429 opened 21 days ago
2
Please don't hoard memory.
#7428 opened 21 days ago
1
b2950 broke RPC mode
#7427 opened 20 days ago
3
Quantization ok but check_tensor_dims: tensor 'output_norm.weight'
#7423 opened 20 days ago
1
server `/embedding` api doesn't handle cases when physical batch size < prompt length.
#7422 opened 21 days ago
1
Generating the same token (token '1') over and over, after a few successful messages?
#7421 opened 20 days ago
4
llamacpp server REST API | How to interrupt and stop an llm generate request.
#7418 opened 21 days ago
2
Add idefics2 support
#7417 opened 21 days ago
0
Specific suggested wiki "Feature Matrix" updates pertaining to SYCL.
#7415 opened 22 days ago
2
Track allocated buffers in rpc-server
#7407 opened 22 days ago
2
convert.py fails importing a new model architecture
#7406 opened 20 days ago
3
Support for Bunny VLM (SigLip + Phi-3)
#7404 opened 22 days ago
1
ci failing on main branch
#7403 opened 22 days ago
1
RPC + Flash attention generation bug
#7401 opened 21 days ago
6
FA2 - P40 || Mixtral partial GPU offload Gibberish
#7400 opened 8 days ago
10
speedup ROCm AMD Unified Memory Architecture
#7399 opened 22 days ago
29
KeyError: 'model.layers.0.attention.wo.weight'
#7396 opened 22 days ago
1
Llama.cpp server doesn't return grammar error messages when in streaming mode
#7391 opened 22 days ago
4
Possible (very serious) bug in chat templates that use '<s>' token having a space added after it
#7390 opened 22 days ago
2
llama_get_logits_ith: invalid logits id 14, reason: no logits
#7386 opened 22 days ago
2