Issues
- 0
Make -DLLAMA_HIP_UMA a dynamic setting.
#7145 opened by sebastian-philipp - 10
Can't run the program
#7181 opened by mike2003 - 1
BF16 prompt processing has half the performance compared to F16 and F32 von AMD Ryzen Embedded V3000 (Zen 3)
#7182 opened by lemmi - 0
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 149, got 147
#7157 opened by YathenStianbase - 5
Support for Consistency Large Language Models?
#7168 opened by unoexperto - 0
LLaVA-NeXT-Video-34B
#7201 opened by mirek190 - 2
[SYCL] Implement Flash attention.
#7141 opened by qnixsynapse - 0
Add support for mistral Dutch and Armenian models: Tweeties/tweety-7b-dutch-v24a and Tweeties/tweety-7b-armenian-v24a
#7170 opened by JohnClaw - 2
Train For Language Translation
#7178 opened by nichellehouston - 2
third party applications are overwhelmingly slow for subsequent prompt evaluation compared to examples/main and examples/server
#7185 opened by khimaros - 2
Server 'penalize_nl' parameter defaults to False?
#7136 opened by AayushG159 - 0
- 9
ggml-cuda.cu:1278: to_fp32_cuda != nullptr
#7211 opened by a-downing - 3
error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s8_x2’?
#7147 opened by CaptainOfHacks - 6
ggml-cuda.so is 90mb with -arch=all
#7156 opened by jart - 12
Native Intel IPEX-LLM Support
#7190 opened by iamhumanipromise - 15
convert-hf-to-gguf-update.py breaks
#7207 opened by CrispStrobe - 1
build server success but execute `ggml_cuda_init: failed to initialize CUDA: unknown error`
#7218 opened by wzhgithub - 6
- 9
- 1
Compilation error using HIP SDK on Windows
#7242 opened by lastrosade - 5
- 11
convert-hf-to-gguf.py breaks on phi-2
#7219 opened by CrispStrobe - 3
- 1
llamacpp --prompt-cache-all < -- more than a year passed and still is not fully implemented
#7179 opened by mirek190 - 2
Should we add an autolabeler for PR?
#7174 opened by mofosyne - 3
- 2
Support request - Google MADLAD400-10B
#7238 opened by nekiee13 - 2
Impact of bf16 on Llama 3 8B perplexity?
#7148 opened by jim-plus - 1
Is Infini-attention support possible?
#7213 opened by sdmorrey - 4
Gibberish response from server and main exits on M1 macstudio ultra with gpu (cpu ok)
#7159 opened by jrozentur - 1
NKVO argument leads to huge compute buffers in full Cublas offload on a heterogeneous dual GPU config.
#7217 opened by Nexesenex - 8
repeatability problem with CUDA backend
#7228 opened by steampunque - 8
Build error at server.cpp: undefined reference to `json_schema_to_grammar
#7189 opened by jarviszeng-zjc - 0
Token generation speed reduces after GPU offloading
#7244 opened by alexmjames - 1
Add metadata override and also generate dynamic default filename when converting gguf
#7165 opened by mofosyne - 3
Server api not functioning with frontends
#7231 opened by wooooyeahhhh - 0
CMakeLists bug in BLAS
#7227 opened by hpcpony - 0
An error occurred while converting Sakura-14B-Qwen2beta-v0.10pre0 to gguf
#7236 opened by lingyezhixing - 3
- 2
bf16 GGUF fails with GGML_ASSERT on CUDA
#7223 opened by ddh0 - 9
Assertion failure on quantization of Meta-Llama-3-70B-Instruct from f16 to various quantization types.
#7215 opened by tigran123 - 1
How to make the examples?
#7220 opened by Zibri - 3
Abort in example server (/completions route) given string-type system_prompt
#7152 opened by justinsteven - 8
- 2
quantize: command not found
#7196 opened by userandpass - 1
- 2
Expanding Swift Package Functionality
#7186 opened by spprichard - 6
Messy CUDA graph error output on mixtral/MoE models
#7175 opened by CISC - 3
[Server] JSON outputs are not being enforced according to the JSON Schema.
#7149 opened by remixer-dec