Error while converting LLama-3.1:8b to ONNX

Question

Error while converting LLama-3.1:8b to ONNX

Opened this issue a month ago · 0 comments

Question

Thanks a lot for this library! I tried converting meta-llama/Llama-3.1-8B-Instruct to ONNX using the following command (on main):

python -m scripts.convert --quantize --model_id "meta-llama/Llama-3.1-8B-Instruct"

Using the following requirements.py file (in a fresh env):

transformers[torch]==4.43.4
onnxruntime==1.19.2
optimum==1.21.3
onnx==1.16.2
onnxconverter-common==1.14.0
tqdm==4.66.5
onnxslim==0.1.31
--extra-index-url https://pypi.ngc.nvidia.com
onnx_graphsurgeon==0.3.27

But got the following error:

Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:27<00:00,  6.99s/it]
Automatic task detection to text-generation-with-past (possible synonyms are: causal-lm-with-past).
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: LlamaForCausalLM *****
Using framework PyTorch: 2.5.0
Overriding 1 configuration item(s)
        - use_cache -> True
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
/site-packages/transformers/models/llama/modeling_llama.py:1037: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if sequence_length != 1:
Traceback (most recent call last):
  File "/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "scripts/convert.py", line 462, in <module>
    main()
  File "scripts/convert.py", line 349, in main
    main_export(**export_kwargs)
  File "/site-packages/optimum/exporters/onnx/__main__.py", line 365, in main_export
    onnx_export_from_model(
  File "/site-packages/optimum/exporters/onnx/convert.py", line 1170, in onnx_export_from_model
    _, onnx_outputs = export_models(
  File "/site-packages/optimum/exporters/onnx/convert.py", line 776, in export_models
    export(
  File "/site-packages/optimum/exporters/onnx/convert.py", line 881, in export
    export_output = export_pytorch(
  File "/site-packages/optimum/exporters/onnx/convert.py", line 577, in export_pytorch
    onnx_export(
  File "/site-packages/torch/onnx/__init__.py", line 375, in export
    export(
  File "/site-packages/torch/onnx/utils.py", line 502, in export
    _export(
  File "/site-packages/torch/onnx/utils.py", line 1564, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/site-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
    graph = _optimize_graph(
  File "/site-packages/torch/onnx/utils.py", line 663, in _optimize_graph
    _C._jit_pass_onnx_graph_shape_type_inference(
RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.

I saw this somewhat related issue #967, but the error didn't happen on the ONNX library (I think v3 has been merged now).

Do you have a fix for larger models such as this one? I also tried with meta-llama/Llama-3.2-3B-Instruct, but I got the same error, even though I see here that you managed to convert it successfully.

Thanks!