Qwen1.5-32B-Chat-GPTQ-Int4 构建失败

Question

Qwen1.5-32B-Chat-GPTQ-Int4 构建失败

Closed this issue 7 months ago · 2 comments

您好！我按照README中构建int4-gptq篇的说明去构建Qwen1.5-32B-Chat-GPTQ-Int4 出现报错：
Traceback (most recent call last): File "/app/tensorrt_llm/examples/qwen32b/build.py", line 763, in <module> args = parse_arguments() File "/app/tensorrt_llm/examples/qwen32b/build.py", line 450, in parse_arguments assert args.n_kv_head == args.world_size, ( AssertionError: The current implementation of GQA requires the number of K/V heads to match the number of GPUs.This limitation will be removed in a future version.
去issue中搜索类似的问题，发现你在这里面（https://github.com/Tlntin/Qwen-TensorRT-LLM/issues/94）回复是修改n_kv_head的值，我查看Qwen1.5-32B-Chat-GPTQ-Int4中的config.json文件，发现里面是有配置num_key_value_heads=8的，无论我尝试修改num_key_value_heads的值为1，或者修改构建命令为：python build.py --use_weight_only --weight_only_precision int4_gptq --per_group --world_size 8 --tp_size 8 --hf_model_dir Qwen1.5-32B-Chat-GPTQ-Int4 --quant_ckpt_path Qwen1.5-32B-Chat-GPTQ-Int4 最终都是构建失败。
config.json文件配置如下：

修改num_key_value_heads的值为1 ，报错如下：

使用命令：python build.py --use_weight_only --weight_only_precision int4_gptq --per_group --world_size 8 --tp_size 8 --hf_model_dir Qwen1.5-32B-Chat-GPTQ-Int4 --quant_ckpt_path Qwen1.5-32B-Chat-GPTQ-Int4
报错如下：

希望您可以在空余时间帮忙解答，不胜感激，麻烦大佬了

Answer 1 · 2024-05-20T03:40:40.000Z

你应该是用的官方仓库而不是本仓库，建议用本仓库代码。

Answer 2 · 2024-05-20T07:00:52.000Z

TvT不好意思哈，是我没有拉取到项目main分支的最新代码~~拉取最新代码，重新进行编译后就正常了~~抱歉哈，打扰大佬了