Issues
- 15
- 9
Support Official llama.cpp docker/images.
#7506 opened - 2
- 1
WizardLM-2-7B get's cut off while writing.
#7501 opened - 2
Trying to understand token sampling better
#7498 opened - 2
我在Android上跑,cpu,非常的慢,跑一张图片需要花费5分钟,请问这个怎么优化?
#7494 opened - 0
Click wrong
#7493 opened - 5
CUDA graphs break quantized K cache
#7492 opened - 0
Error during compilation(make/cmake)
#7490 opened - 4
Completion of error handling
#7489 opened - 5
- 2
tokenization: double EOS tokens
#7484 opened - 4
- 9
- 2
- 0
- 1
Is it possible to create a DLL for main.cpp?
#7460 opened - 3
- 3
- 0
How to convert Microsoft/trocr to ggml format
#7453 opened - 7
- 1
- 4
Build fails with `ggml-vulkan.cpp:6880:80: error: cannot convert ‘ggml_tensor*’ to ‘float’`
#7446 opened - 11
FR: Phi-3-vision-128k-instruct implementation
#7444 opened - 3
Performance Regression Observed in llama.cpp
#7443 opened - 4
- 31
Phi 3 medium/small support
#7439 opened - 1
Still not working with Meta-Llama-3-8B-Instruct
#7437 opened - 1
[server] phi-3 uses <|endoftext|> instead of <|end|> when applying chat template in /chat/completions
#7432 opened - 0
- 2
Why is convert-lora-to-ggml.py removed?
#7429 opened - 1
Please don't hoard memory.
#7428 opened - 3
b2950 broke RPC mode
#7427 opened - 1
- 1
- 4
- 2
- 0
Add idefics2 support
#7417 opened - 2
- 2
Track allocated buffers in rpc-server
#7407 opened - 3
- 1
Support for Bunny VLM (SigLip + Phi-3)
#7404 opened - 1
ci failing on main branch
#7403 opened - 6
RPC + Flash attention generation bug
#7401 opened - 10
- 29
speedup ROCm AMD Unified Memory Architecture
#7399 opened - 1
KeyError: 'model.layers.0.attention.wo.weight'
#7396 opened - 4
- 2
Possible (very serious) bug in chat templates that use '<s>' token having a space added after it
#7390 opened - 2