Mixtral 8x7b failed on compile with tensorrt-llm
gloritygithub11 opened this issue · 2 comments
gloritygithub11 commented
config file:
base:
seed: &seed 42
model:
type: Mixtral
path: /models/Mixtral-8x7B-Instruct-v0.1
torch_dtype: auto
calib:
name: pileval
download: False
path: /app/llmc/tools/data/calib/wikitext2
n_samples: 128
bs: -1
seq_len: 512
preproc: pileval_awq
seed: *seed
eval:
eval_pos: []
name: wikitext2
download: False
path: /app/llmc/tools/data/eval/wikitext2
bs: 1
seq_len: 2048
quant:
method: Awq
weight:
bit: 4
symmetric: True
granularity: per_group
group_size: 128
save:
save_trans: False
save_trtllm: True
trtllm_cfg:
tp_size: 1
pp_size: 1
save_path: ./save
get error:
2024-08-05 09:36:22.985 | INFO | llmc.utils.export_trtllm:cvt_trtllm_engine:93 - Start to export trtllm engine...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 19/19 [00:09<00:00, 2.08it/s]
[08/05/2024-09:36:34] Some parameters are on the meta device device because they were offloaded to the cpu.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/app/llmc/llmc/__main__.py", line 147, in <module>
main(config)
File "/app/llmc/llmc/__main__.py", line 88, in main
cvt_trtllm_engine(
File "/app/llmc/llmc/utils/export_trtllm.py", line 95, in cvt_trtllm_engine
convert_and_save_hf(hf_model, output_dir, cfg)
File "/app/llmc/llmc/utils/export_trtllm.py", line 88, in convert_and_save_hf
convert_and_save_rank(cfg, rank=0)
File "/app/llmc/llmc/utils/export_trtllm.py", line 75, in convert_and_save_rank
llama = LLaMAForCausalLM.from_hugging_face(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 280, in from_hugging_face
llama = convert.from_hugging_face(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1325, in from_hugging_face
weights = load_weights_from_hf(config=config,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1434, in load_weights_from_hf
weights = convert_hf_llama(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1087, in convert_hf_llama
convert_layer(l)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 740, in convert_layer
get_tllm_linear_weight(split_v, tllm_prex + 'attention.qkv.',
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 457, in get_tllm_linear_weight
v.cpu(), plugin_weight_only_quant_type)
NotImplementedError: Cannot copy out of meta tensor; no data!
Harahan commented
We will fix this later.
gushiqiao commented