Issues
- 1
我看到列表里支持qwen7B,请问是否支持qwen1.5-14B呢?
#422 opened by koalaaaaaaaaa - 1
[Question] Support for Mistral?
#415 opened by BaiMoHan - 2
- 1
[BUG] Slow Tokenizer Message is printing when the Fast Tokenizer may be in use
#407 opened by david-vectorflow - 1
Qwen-14B-INT8 face the issue: 'QwenTransformerLayerWeight' object has no attribute 'q_weight_'
#333 opened by wangr0031 - 1
- 2
support qwen1.5?
#389 opened by xxm1668 - 3
如何支持模型输出一个特定的output的困惑度
#401 opened by harvinyou - 0
- 1
请问是否支持llama3架构
#402 opened by harvinyou - 1
请问是否有计划支持MiniCPM-V-2
#404 opened by xiabo0816 - 0
请问是否有计划支持MiniCPM-V-2
#403 opened by xiabo0816 - 1
- 7
我希望获得原始的logits。应该是使用哪个api
#398 opened by harvinyou - 5
请问现在支持Yi-34B的awq 4bit部署吗?
#291 opened by xyfZzz - 1
请问有计划支持其他加速卡(非英伟达)吗?
#399 opened by huangfude - 1
如何多卡启动推理模型
#393 opened by harvinyou - 1
flash_llm_fp6_llm
#383 opened by wm901115nwpu - 1
`grid` in `context_attention_fwd_no_prompt_cache`
#381 opened by liyucheng09 - 2
[BUG]The inference result includes extra prompt content(推理结果会出现额外的prompt内容Human:xxxxx, \n Assistant:xxxxx)
#372 opened by SleepyHollowforesthills - 1
[Ask] PageAttention 和 TokenAttention 的对比
#379 opened by zzb610 - 0
[BUG] There already is a lightllm in pypi
#380 opened by rlippmann - 1
- 1
[BUG]Error in llama/triton_kernel/silu_and_mul.py/test_silu_and_mul function due to in-place modification of parameters and Triton kernel error in version 2.0.0
#357 opened by mivenis - 1
weight only int4 is slower than cutlass int4
#362 opened by zhoutianzi666 - 2
[BUG] failed to serve a Qwen1.5-72B-chat model
#350 opened by pluiez - 3
- 1
[BUG]Benchmark有一些问题
#329 opened by Storm0921 - 1
[BUG] Support for DeepSeek?
#325 opened by suhjohn - 5
InternLM2-20B不支持
#327 opened by Storm0921 - 0
[BUG] stop_words
#326 opened by baisechundu - 1
int4_kernel
#324 opened by Cydia2018 - 10
[Feature]请帮忙提供load_from_weight_dict(weight_dict)接口。
#277 opened by bingo787 - 7
[BUG] Baichuan13B model init failed
#323 opened by bingo787 - 1
llama rope sin cos的形状
#319 opened by feifeibear - 2
- 2
是否能支持sqlcoder系列模型
#310 opened by 2496289471 - 1
请问lightllm可以离线推理吗,有没有参考代码
#308 opened by monkeyZhy - 5
No module named petrel_client
#298 opened by Lvjinhong - 2
- 4
What is the plan to support beam search
#286 opened by feifeibear - 4
LlamaTpPartModel如何使用
#287 opened by feifeibear - 5
- 1
请问支持多机推理吗?
#271 opened by zbtrs - 2
Is there any comparison of the effects related to token attention? For example, compare with page attention
#268 opened by skykiseki - 1
- 1
Custom template
#245 opened by bino282 - 1
Is ChatGLM3-6b supported yet?
#246 opened by Jeru2023 - 3
[BUG] Baihcuan2-13B 输入token长度1024左右 返回单个终止符
#244 opened by HJT9328 - 3
和vLLM的temperature参数对齐问题
#241 opened by ArachisTong