[Bug]: pipelines中语义检索系统,启动运行后,上传扫描式PDF文件 无法解析
Opened this issue · 1 comments
morego123 commented
软件环境
paddle-pipelines 0.6.2
paddle2onnx 1.2.1
paddlefsl 1.1.0
paddlenlp 2.8.0
paddleocr 2.7.3
paddlepaddle-gpu 2.6.0.post117
重复问题
- I have searched the existing issues
错误描述
INFO: 127.0.0.1:43132 - "POST /file-upload HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/paddle_pipelines-0.6.2-py3.10.egg/pipelines/pipelines/base.py", line 446, in run
node_output, stream_id = self.graph.nodes[node_id]["component"]._dispatch_run(**node_input)
File "/usr/local/lib/python3.10/dist-packages/paddle_pipelines-0.6.2-py3.10.egg/pipelines/nodes/base.py", line 120, in _dispatch_run
return self._dispatch_run_general(self.run, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle_pipelines-0.6.2-py3.10.egg/pipelines/nodes/base.py", line 164, in _dispatch_run_general
output, stream = run_method(**run_inputs, **run_params)
File "/usr/local/lib/python3.10/dist-packages/paddle_pipelines-0.6.2-py3.10.egg/pipelines/nodes/retriever/base.py", line 144, in run
output, stream = run_indexing(documents=documents, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle_pipelines-0.6.2-py3.10.egg/pipelines/nodes/retriever/base.py", line 110, in wrapper
ret = fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle_pipelines-0.6.2-py3.10.egg/pipelines/nodes/retriever/base.py", line 229, in run_indexing
embeddings = self.embed_documents(document_objects, **kwargs) # type: ignore
File "/usr/local/lib/python3.10/dist-packages/paddle_pipelines-0.6.2-py3.10.egg/pipelines/nodes/retriever/dense.py", line 367, in embed_documents
embeddings = self._get_predictions(passages, **kwargs)["passages"]
File "/usr/local/lib/python3.10/dist-packages/paddle_pipelines-0.6.2-py3.10.egg/pipelines/nodes/retriever/dense.py", line 292, in _get_predictions
if "passages" in dicts[0]:
IndexError: list index out of range
稳定复现步骤 & 代码
在网页端,左侧文件上传模块,上传扫描式PDF文件 无法解析。上传非扫描件PDF,正常。
对于扫描式PDF文件,是此repo本来无法解析,还是我哪个组件没安装?
w5688414 commented
您好,目前不支持扫描件的PDF,欢迎开发者贡献。