luchangli03/export_llama_to_onnx

使用onnx库读取转换好的onnx模型报错

L1-M1ng opened this issue · 7 comments

使用qwen脚本将qwen-7b预训练模型(https://huggingface.co/Qwen/Qwen-7B)转换为onnx模型,成功运行:
image
但是使用onnx库直接读取模型,
image
出现如下错误:
Traceback (most recent call last):
File "/home/mingli/projects/Qwen/run_ort_optimize.py", line 10, in
model = ORTModelForQuestionAnswering.from_pretrained(
File "/home/mingli/anaconda3/envs/py38/lib/python3.8/site-packages/optimum/onnxruntime/modeling_ort.py", line 662, in from_pretrained
return super().from_pretrained(
File "/home/mingli/anaconda3/envs/py38/lib/python3.8/site-packages/optimum/modeling_base.py", line 399, in from_pretrained
return from_pretrained_method(
File "/home/mingli/anaconda3/envs/py38/lib/python3.8/site-packages/optimum/onnxruntime/modeling_ort.py", line 510, in _from_pretrained
model = ORTModel.load_model(
File "/home/mingli/anaconda3/envs/py38/lib/python3.8/site-packages/optimum/onnxruntime/modeling_ort.py", line 373, in load_model
return ort.InferenceSession(
File "/home/mingli/anaconda3/envs/py38/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 360, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/mingli/anaconda3/envs/py38/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 408, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/optimizer/initializer.cc:43 onnxruntime::Initializer::Initializer(const onnx::TensorProto&, const onnxruntime::Path&) [ONNXRuntimeError] : 1 : FAIL : GetFileLength for /home/mingli/.cache/huggingface/hub/models--L1-m1ng--qwen7b-inf/snapshots/bece1b085ce67f25804cc5ba8e99ec2e80c865e5/model.transformer.h.0.ln_1.weight failed:Invalid fd was supplied: -1
请问该如何解决呢?
使用的python包版本如下:
Package Version Editable project location


absl-py 2.1.0
accelerate 0.26.1
aiohttp 3.9.3
aiosignal 1.3.1
antlr4-python3-runtime 4.9.3
async-timeout 4.0.3
attrs 23.2.0
auto_gptq 0.7.0
certifi 2024.2.2
chardet 5.2.0
charset-normalizer 3.3.2
click 8.1.7
cmake 3.25.0
colorama 0.4.6
coloredlogs 15.0.1
colorlog 6.8.2
contextlib2 21.6.0
contourpy 1.1.1
cycler 0.12.1
DataProperty 1.0.1
datasets 2.16.1
Deprecated 1.2.14
dill 0.3.7
einops 0.7.0
evaluate 0.4.1
filelock 3.13.1
flatbuffers 23.5.26
flatten-dict 0.4.2
fonttools 4.47.2
frozenlist 1.4.1
fsspec 2023.10.0
gekko 1.0.6
huggingface-hub 0.20.3
humanfriendly 10.0
hydra-colorlog 1.2.0
hydra-core 1.3.2
idna 3.6
importlib-resources 6.1.1
intel-extension-for-pytorch 2.2.0
Jinja2 3.1.3
joblib 1.3.2
jsonlines 4.0.0
kiwisolver 1.4.5
lit 15.0.7
lm_eval 0.4.0 /home/mingli/projects/Qwen/lm-evaluation-harness
lxml 5.1.0
markdown-it-py 3.0.0
MarkupSafe 2.1.4
matplotlib 3.7.4
mbstrdecoder 1.1.3
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.15
networkx 3.1
neural-compressor 2.4.1
nltk 3.8.1
numexpr 2.8.6
numpy 1.24.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.3.101
nvidia-nvtx-cu12 12.1.105
omegaconf 2.3.0
oneccl-bind-pt 2.2.0+cpu
onnx 1.15.0
onnxruntime 1.14.1
opencv-python-headless 4.9.0.80
optimum 1.17.1
optimum-benchmark 0.0.2 /home/mingli/projects/Qwen/optimum-benchmark
optimum-intel 1.15.0.dev0
packaging 23.2
pandas 2.0.3
pathvalidate 3.2.0
peft 0.8.2
pillow 10.2.0
pip 24.0
portalocker 2.8.2
prettytable 3.9.0
protobuf 4.25.2
psutil 5.9.8
py-cpuinfo 9.0.0
py3nvml 0.2.7
pyarrow 15.0.0
pyarrow-hotfix 0.6
pybind11 2.11.1
pycocotools 2.0.7
Pygments 2.17.2
pyparsing 3.1.1
pyrsmi 1.0.2
pytablewriter 1.2.0
python-dateutil 2.8.2
pytz 2024.1
PyYAML 6.0.1
regex 2023.12.25
requests 2.31.0
responses 0.18.0
rich 13.7.0
rouge 1.0.1
rouge-score 0.1.2
sacrebleu 2.4.0
safetensors 0.4.2
schema 0.7.5
scikit-learn 1.3.2
scipy 1.10.1
sentencepiece 0.1.99
setuptools 68.2.2
six 1.16.0
sqlitedict 2.1.0
sympy 1.12
tabledata 1.3.3
tabulate 0.9.0
tcolorpy 0.1.4
threadpoolctl 3.2.0
tiktoken 0.5.2
tokenizers 0.15.1
torch 2.2.0+cu121
torchaudio 2.2.0+cu121
torchvision 0.17.0+cu121
tqdm 4.66.1
tqdm-multiprocess 0.0.11
transformers 4.35.2
transformers-stream-generator 0.0.4
triton 2.2.0
typepy 1.3.2
typing_extensions 4.9.0
tzdata 2023.4
urllib3 2.2.0
wcwidth 0.2.13
wheel 0.41.2
wrapt 1.16.0
xmltodict 0.13.0
xxhash 3.4.1
yarl 1.9.4
zipp 3.17.0
zstandard 0.22.0

你升级下onnxruntime到最新版看看?

不好意思,我重新查看了一下,发现onnx直接加载是没有错误的,已更新onnxruntime至最新1.17;
出错的代码是,我使用optimum库的onnxruntime加载我传到huggingface上的模型时报错:
image

经过对比,optimum自己从huggingface的模型只有.onnx模型以及tokenizer_config.json两个文件,导致ort.InferenceSession在加载时在cache中找不到权重文件。
我使用ort.InferenceSession直接加载本地模型就成功了
cache中文件如下:
image
而本地如下:
image

再使用这个工具:https://github.com/luchangli03/onnxsim_large_model 优化下,最后就只有一个onnx和一个权重文件了,而不是这么一大堆。

好的,非常感谢您的帮助! 已经成功简化!
image
另外,还有最后一个问题,请问您是如何确定不同llm的kv_cache的shape? 比如qwen模型的kvcache是如下形状,与chatglm2的不一样
image
烦请指教,感谢!

可以先用一个prompt跑下模型,在模型代码里面加打印

好的,谢谢您的帮助!