kvcache-ai/ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

PythonApache-2.0

Issues

Error loading model: token_embd.weight not found in GGUF file
#102 opened a month ago by antonovkz
1
复现InternLM2.5-7B-Chat-1M报错
#114 opened 2 months ago by Cherishyt
0
硬件配置支持
#112 opened 2 months ago by Cherishyt
1
RuntimeError CUDA error when running Infinite Bench
#113 opened 2 months ago by Flitieter
0
Detailed specification of the computer hardware to run 236B DeepSeek-Coder-V2
#108 opened 2 months ago by atomlayer
1
Support for New SOTA MoE: tencent/Tencent-Hunyuan-Large
#111 opened 2 months ago by ThomasBaruzier
0
install error on windows, need help
#109 opened 2 months ago by gaowayne
0
how to implement new algorithm in this repo?
#105 opened 2 months ago by lumiere-ml
1
feature request: support internvl2
#107 opened 2 months ago by kolinfluence
0
Long prompt with DeepSeek crashing with tensor size mismatch
#101 opened 3 months ago by bitbottrap
11
Attempting to increase output to 16k results in crash during output
#104 opened 2 months ago by bitbottrap
1
Error Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
#96 opened 3 months ago by drrros
5
How to infer quantized models on CPU&GPU
#103 opened 2 months ago by shuzhang-pku
1
Does ktransformers support deepseek V2.5？
#100 opened 3 months ago by huliangbing
2
Specify MAX_NEW_TOKENS for ktransformers server
#92 opened 3 months ago by arthurv
2
Busy loop in cpu_backend/task_queue.cpp keeps 1 thread at 100% CPU when queue is empty
#80 opened 3 months ago by sayap
5
are marline and q4k totally equivalent?
#87 opened 3 months ago by Eutenacity
5
How can I use opencompass benchmark tools to test ktransformers in long context?
#91 opened 3 months ago by AsVoider
1
Getting reasonable performance on dual RTX 3090 and 128gb
#85 opened 4 months ago by trilog-inc
7
Deepseekv2推理速度很慢，看样子似乎在cpu上做推理，gpu利用率很低
#93 opened 3 months ago by Chain-Mao
3
ImportError: DLL load failed while importing KTransformersOps: The specified module was not found.
#94 opened 3 months ago by SCP12rs
7
Suggestion to add DeepSeek v2.5 support
#95 opened 3 months ago by arisau
4
Installation Problem
#90 opened 3 months ago by Chain-Mao
1
Installation requirements
#89 opened 4 months ago by arthurv
4
可以给出详细的硬件配置清单吗?
#84 opened 4 months ago by qixing-ai
2
Seg Fault on long replies
#82 opened 4 months ago by matthusby
2
What is the maximum input token size supported for DeepSeek V2?
#68 opened 4 months ago by fengyang95
1
8-GPU configuration on L40 OOM
#76 opened 4 months ago by fengyang95
8
Is deepseek-ai/DeepSeek-V2.5 supported?
#79 opened 4 months ago by AshD
9
docker container fails to start due to missing package 'uvicorn'
#66 opened 4 months ago by sammcj
1
How can i run internlm2_5-7b-chat-1m in ktransformers?
#74 opened 4 months ago by Ma1oneZhang
4
Missing pip packages flash_attn and wheel
#69 opened 4 months ago by bitbottrap
2
When the input token exceeds 4096, an error will occur.
#73 opened 4 months ago by fengyang95
4
UnboundLocalError: cannot access local variable 'chunck_mask' where it is not associated with a value
#70 opened 4 months ago by fengyang95
2
Would you support glm4-chat-1m
#65 opened 4 months ago by choyakawa
1
docker builds and pip install broken - No module named 'cpufeature'
#61 opened 4 months ago by sammcj
5
More Efficient Layer Distribution for DeepSeek Coder v2 on Multiple GPUs and CPUs
#49 opened 4 months ago by BGFGB
4
Support for Mistral-Large-Instruct-2407-GGUF ？
#53 opened 4 months ago by LIUKAI0815
2
Add a instruction for configuring CUDA_HOME and CUDA_PATH to the install section of README.md.
#54 opened 4 months ago by hyx1999
2
How to properly disable offloading MoE layers to CPU?
#50 opened 4 months ago by molamooo
5
Cannot run DeepSeek V2 Chat in server mode on 2 GPUs
#45 opened 4 months ago by ELigoP
1
Mixtral-8x7B-v0.1 GGUF file error
#42 opened 4 months ago by RealLittleXian
1
CUDA error: No kernel image is available for execution on the device
#44 opened 4 months ago by Forsworns
2
Ubuntu 24.04 GLIBCXX version fail
#37 opened 4 months ago by ELigoP
3
Can I run llama3.1 70b with rtx4090+64g ddr5 ram?
#47 opened 4 months ago by codeMonkey-shin
1
[ENHANCEMENT] improve GPU utilization for multi-GPU
#46 opened 4 months ago by ELigoP
1
ollama chat not realised
#32 opened 5 months ago by xldistance
2
using docker start api server can't set max_new_tokens
#31 opened 5 months ago by goldenquant
1
Unable to use the web interface
#33 opened 5 months ago by xldistance
0
using docker got errors
#28 opened 5 months ago by goldenquant
3