Issues
- 27
how to build Qwen-72B-Chat-Int4 with tp=2
#94 opened by liyunhan - 10
triton同步异步接口询问
#91 opened by dongteng - 8
运行run.py报错,Segmentation fault (core dumped)
#93 opened by ArlanCooper - 2
- 2
运行build文件报错: TypeError: RowLinear.__init__() got an unexpected keyword argument 'instance_id'
#86 opened by ArlanCooper - 1
请问目前的Qwen-VL实现方式,是否仅支持输入单张图片,且图片必须在输入的开头?
#90 opened by xikaluo - 21
测试hf吞吐OOM以及triton并发、流式输出问题
#81 opened by dongteng - 2
使用triton + inflight_batching 后吞吐反而降了
#62 opened by zhisunyy - 2
请问如何支持正常的batch infer ?
#88 opened by zhangyu68 - 11
- 6
请问为什么smoothquant量化后显存占用不降低呢
#87 opened by tp-nan - 0
大佬有没有对比和VLLM的推理效果?
#72 opened by x-transformers - 5
使用auto-gptq编译qwen_1_8B-Chat-int4官方报错'KeyError: 'transformer.h.0.attn.c_attn.qweight'
#83 opened by fmozer - 2
想问一下,为什么72B模型是实验性的呢?架构应该是一样的呀,原因是什么呢?谢谢
#84 opened by zhangjiekui - 1
Qwen-72B-Chat-Int4 killed
#82 opened by Hukongtao - 3
ERROR: Failed to create instance: unexpected error when creating modelInstanceState
#71 opened by lyc728 - 2
Qwen1.5 GPTQ用不了
#76 opened by Pevernow - 15
Qwen1.5 GPTQ-Int4 编译失败
#77 opened by ljhssga - 1
Qwen1.5 GPTQ编译错误
#78 opened by compass-star - 5
Qwen2 编译错误
#80 opened by mogoxx - 2
- 25
Qwen-72B有遇到输入超过2048以后返回有问题的情况吗
#45 opened by piaoxiaobo - 6
请问是否有尝试过在mpirun -n 大于1的情况下提供http服务?
#59 opened by xikaluo - 1
swift微调的qwen-vl支持吗
#75 opened by xs818818 - 0
- 0
- 1
web demo error
#70 opened by HappyKerry - 9
- 23
inflight_batching
#66 opened by lyc728 - 4
- 4
Qwen-14B-Chat-Int4运行后预测结果不对
#68 opened by takemars - 1
Qwen-VL build.py: error: unrecognized arguments: --use_rmsnorm_plugin --use_lookup_plugin float16 --max_prompt_embedding_table_size 2048
#67 opened by 77h2l - 2
- 7
qwen-14b int4-awq 量化失败
#64 opened by zhisunyy - 12
triron部署成功后,每个卡上多出来几个进程
#63 opened by x-transformers - 9
Triton部署TensorRT-LLM报错
#60 opened by zhisunyy - 6
使用autodl编译tensorrt-llm有问题
#54 opened by oreo-lp - 6
Use official int4 weights, e.g. Qwen-1_8B-Chat-Int4 model(recommended) - Build TRT-LLM engine
#53 opened by byjswr - 1
summarize.py运行解答
#56 opened by lyc728 - 1
推理加速效果怎么样?
#61 opened by yanguowei316 - 1
Qwen-14B INT4-AWQ 用tp=2时量化失败
#58 opened by comeby - 19
Triton的显存占用是TensorRT—llm的两倍
#51 opened by lyc728 - 3
想使用baichuan2部署api的话该修改什么地方适配百川模型呢?
#52 opened by secain - 3
Qwen-14B-chat 多batch 报错
#55 opened by zhisunyy - 15
build完之后跑cli_chat.py报错
#50 opened by felixstander - 30
cnn_dailymail
#49 opened by lyc728 - 5
- 3
qwen-14b-chat-int4转完之后推理乱码
#47 opened by xiamaozi11 - 3
多机多卡推理
#44 opened by zhudongwork - 4
运行build的时候出了点问题
#42 opened by Arcmoon-Hu