Error while converting LLama-3.1:8b to ONNX
Opened this issue · 0 comments
charlesbvll commented
Question
Hey @xenova,
Thanks a lot for this library! I tried converting meta-llama/Llama-3.1-8B-Instruct
to ONNX using the following command (on main
):
python -m scripts.convert --quantize --model_id "meta-llama/Llama-3.1-8B-Instruct"
Using the following requirements.py
file (in a fresh env):
transformers[torch]==4.43.4
onnxruntime==1.19.2
optimum==1.21.3
onnx==1.16.2
onnxconverter-common==1.14.0
tqdm==4.66.5
onnxslim==0.1.31
--extra-index-url https://pypi.ngc.nvidia.com
onnx_graphsurgeon==0.3.27
But got the following error:
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:27<00:00, 6.99s/it]
Automatic task detection to text-generation-with-past (possible synonyms are: causal-lm-with-past).
Using the export variant default. Available variants are:
- default: The default ONNX variant.
***** Exporting submodel 1/1: LlamaForCausalLM *****
Using framework PyTorch: 2.5.0
Overriding 1 configuration item(s)
- use_cache -> True
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
/site-packages/transformers/models/llama/modeling_llama.py:1037: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if sequence_length != 1:
Traceback (most recent call last):
File "/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "scripts/convert.py", line 462, in <module>
main()
File "scripts/convert.py", line 349, in main
main_export(**export_kwargs)
File "/site-packages/optimum/exporters/onnx/__main__.py", line 365, in main_export
onnx_export_from_model(
File "/site-packages/optimum/exporters/onnx/convert.py", line 1170, in onnx_export_from_model
_, onnx_outputs = export_models(
File "/site-packages/optimum/exporters/onnx/convert.py", line 776, in export_models
export(
File "/site-packages/optimum/exporters/onnx/convert.py", line 881, in export
export_output = export_pytorch(
File "/site-packages/optimum/exporters/onnx/convert.py", line 577, in export_pytorch
onnx_export(
File "/site-packages/torch/onnx/__init__.py", line 375, in export
export(
File "/site-packages/torch/onnx/utils.py", line 502, in export
_export(
File "/site-packages/torch/onnx/utils.py", line 1564, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/site-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
graph = _optimize_graph(
File "/site-packages/torch/onnx/utils.py", line 663, in _optimize_graph
_C._jit_pass_onnx_graph_shape_type_inference(
RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.
I saw this somewhat related issue #967, but the error didn't happen on the ONNX library (I think v3
has been merged now).
Do you have a fix for larger models such as this one? I also tried with meta-llama/Llama-3.2-3B-Instruct
, but I got the same error, even though I see here that you managed to convert it successfully.
Thanks!