LLama3-8B-Instruct fail for TensorRT-LLM
gloritygithub11 opened this issue · 2 comments
gloritygithub11 commented
Hello,
I'm tring to build with tensorrt, following is the config file:
base:
seed: &seed 42
model:
type: Llama
path: /models/Meta-Llama-3-8B-Instruct
torch_dtype: auto
calib:
name: pileval
download: False
path: /app/llmc/tools/data/calib/wikitext2
n_samples: 128
bs: -1
seq_len: 512
preproc: pileval_awq
seed: *seed
eval:
eval_pos: []
name: wikitext2
download: False
path: /app/llmc/tools/data/eval/wikitext2
bs: 1
seq_len: 2048
quant:
method: Awq
weight:
bit: 4
symmetric: True
granularity: per_group
group_size: 128
save:
save_trans: False
save_trtllm: True
trtllm_cfg:
tp_size: 1
pp_size: 1
save_path: ./save
I got following error:
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024052800
2024-08-05 08:09:33.435 | INFO | llmc.utils.export_trtllm:cvt_trtllm_engine:93 - Start to export trtllm engine...
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.44it/s]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/app/llmc/llmc/__main__.py", line 147, in <module>
main(config)
File "/app/llmc/llmc/__main__.py", line 88, in main
cvt_trtllm_engine(
File "/app/llmc/llmc/utils/export_trtllm.py", line 95, in cvt_trtllm_engine
convert_and_save_hf(hf_model, output_dir, cfg)
File "/app/llmc/llmc/utils/export_trtllm.py", line 88, in convert_and_save_hf
convert_and_save_rank(cfg, rank=0)
File "/app/llmc/llmc/utils/export_trtllm.py", line 75, in convert_and_save_rank
llama = LLaMAForCausalLM.from_hugging_face(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 280, in from_hugging_face
llama = convert.from_hugging_face(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1333, in from_hugging_face
llama.load(weights)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 422, in load
raise RuntimeError(
RuntimeError: Required but not provided tensors:{'transformer.vocab_embedding.per_token_scale'}
Weights loaded. Total time: 00:01:54
Harahan commented
We will fix this later.
gushiqiao commented