Issues
- 0
- 1
Different result between use llama_tokenize and python original transformers tokenizer
#7384 opened by Liufeiran123 - 6
ggml_validate_row_data finding nan value for IQ4_NL
#7311 opened by bartowski1182 - 3
Custom `seed` values ignored by `llama.cpp HTTP server`
#7381 opened by mirekphd - 3
- 2
[Android/Termux] Significantly higher RAM usage with Vulkan compared to CPU only
#7351 opened by egeoz - 4
llama : save downloaded models to local cache
#7252 opened by ggerganov - 1
Why does the server-cuda container consume CPU time?
#7377 opened by wencan - 0
convert-hf-to-gguf.py fails PR #7234
#7380 opened by LostRuins - 0
Can I handle multiple images in the same context?
#7364 opened by Eriter555 - 1
- 0
Funny response with LLaMa 3 8B
#7367 opened by Sewlell - 0
bf16 problem
#7365 opened by Zibri - 0
[SYCL] include shared libs in sycl release
#7361 opened by gfody - 1
Description of "-t N" option for server is inaccurate
#7355 opened by tigran123 - 0
Need help on building shared libraries on Windows machine for Android x86_64 (emulator)
#7357 opened by cmpktheo - 4
- 2
Improve and expand Wikipedia article about llama.cpp
#7294 opened by fffelix-jan - 1
Possible performance boost with 2-pass online softmax
#7306 opened by zixuanweeei - 8
RPC issues and comments
#7293 opened by steampunque - 2
convert.py still fails on llama3 8B-Instruct downloaded directly from Meta (Huggingface works)
#7339 opened by aleloi - 0
AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall
#7344 opened by Trat8547 - 1
Flash attention implementations do not handle case where value vectors have different dimension from query vectors
#7343 opened by fairydreaming - 2
Pretokenizer not supported by conversion script
#7338 opened by eleius - 2
- 1
Issues: Unable for multiuser prompt
#7336 opened by OlivesHere - 1
Segmentation Fault on GPU
#7337 opened by djain-fujitsu - 0
enable rpc for server
#7292 opened by steampunque - 2
Support Falcon2-11B
#7318 opened by reneleonhardt - 0
- 6
Llama3-8b & Perplexity.exe Issue
#7291 opened by InferenceIllusionist - 0
GGML_ASSERT(n_embd_gqa == n_embd_k_gqa) fails in models where key vector dimension is different from value vector dimension
#7331 opened by fairydreaming - 1
Add support for multilingual Viking models, please.
#7309 opened by JohnClaw - 1
support long context llama 3 models
#7312 opened by bachittle - 1
Support for IBM Granite models
#7307 opened by ichDaheim - 2
How to quantize fine-tune LLM into GGUF format
#7299 opened by dibyendubiswas1998 - 0
relocation R_X86_64_32 against hidden symbol `__TMC_END__' can not be used when making a shared object
#7301 opened by asarubbo - 0
Llama-3 Instruct tokenizer_config.json changes in relation to the currently fetched llama-bpe configs.
#7289 opened by Spacellary - 4
In my os, the @ symbol and spaces don’t play nicely in llama.cpp directory.
#7247 opened by atljoseph - 2
Error when trying to convert a HF model which is a LORA PEFT fine tuned version of phi-128k
#7287 opened by swarnava112 - 3
Windows MSYS2 compilation error. [SOLVED]
#7275 opened by Zibri - 4
- 3
- 0
Infinite update_slots issue on latest build (1265c67)
#7283 opened by Leowolf93 - 3
MPI issue on raspberry pi cluster
#7260 opened by zhouwul - 1
- 1
How to build the llamacpp's .so file separately and then pass it in the llama_cpp_python / wrapper libraries directly.
#7250 opened by fastdaima - 2
Text Generation task
#7256 opened by rexionmars - 8
Performance regression with CUDA after commit 9c67c277
#7254 opened by rgerganov - 0