opendatalab/PDF-Extract-Kit

Cannot run on Mac M-chip

bookandlover opened this issue · 11 comments

Errors as the following:

(.venv) (base) pengxiong@PENGMacPro PDF-Extract-Kit % python pdf_extract.py --pdf demo/demo1.pdf
[2024-07-19 20:17:51,713] [ ERROR] check_version.py:39 - Error fetching version info
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1038, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 976, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1455, in connect
self.sock = self._context.wrap_socket(self.sock,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1075, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1346, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/albumentations/check_version.py", line 29, in fetch_version_info
with opener.open(url, timeout=2) as response:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 519, in open
response = self._open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 496, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)>
Traceback (most recent call last):
File "/Users/pengxiong/LLM/PDF-Extract-Kit/pdf_extract.py", line 18, in
from unimernet.common.config import Config
File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/unimernet/init.py", line 18, in
from unimernet.tasks import *
File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/unimernet/tasks/init.py", line 10, in
from unimernet.tasks.unimernet_train import UniMERNet_Train
File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/unimernet/tasks/unimernet_train.py", line 11, in
from torchtext.data import metrics
File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/init.py", line 18, in
from torchtext import _extension # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/_extension.py", line 64, in
_init_extension()
File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/_extension.py", line 58, in _init_extension
_load_lib("libtorchtext")
File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/_extension.py", line 50, in _load_lib
torch.ops.load_library(path)
File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torch/_ops.py", line 1354, in load_library
ctypes.CDLL(path)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ctypes/init.py", line 376, in init
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/lib/libtorchtext.so, 0x0006): Symbol not found: __ZN3c105ErrorC1ENSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES7_PKv
Referenced from: <5436ECC1-6F45-386E-B542-D5F76A22B52C> /Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/lib/libtorchtext.so
Expected in: /Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torch/lib/libc10.dylib

I encountered the same problem, this was after I performed the following steps:
image
image

I encountered the same problem, this was after I performed the following steps:

image image

don't mind this error,you can try to run it.

I encountered the same problem, this was after I performed the following steps:
image
image

don't mind this error,you can try to run it.

When I run it I'm getting the following error:
image

Can you help me? Thank you!

OSError: dlopen(/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/lib/libtorchtext.so, 0x0006): Symbol not found: __ZN3c105ErrorC1ENSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES7_PKv

TorchText development is stopped and the 0.18 release (April 2024) will be the last stable release of the library.

The versions of torch, torchvision, and torchtext need to be compatible. but torchtext 0.18.0 not compatibility of torch 2.5.0
maybe you can

pip uninstall torch torchvision torchtext
pip install --pre torch torchvision torchtext --index-url https://download.pytorch.org/whl/nightly/cpu

to try use the nightly build to open mps, or

pip uninstall torch torchvision torchtext
pip install torch torchvision torchtext 

you can try using mps for this, if that's not possible, the cpu will work as well.

Thank you! However this gets me here, I'm just gonna try the cpu version, thank you!
image

Thank you! However this gets me here, I'm just gonna try the cpu version, thank you!

image

As a temporary fix, you can set the environment var
iable 'PYTORCH_ENABLE_MPS_FALLBACK=1' to use the CPU as a fallback for this op.

image Solved after many tries. The problem comes from certifi update. Refer: 这个错误是由于权限不足导致无法更新 `certifi` 包。你需要使用管理员权限来运行该命令。可以尝试以下步骤来解决这个问题:

1. 使用 sudo 命令

在终端中运行以下命令,使用 sudo 提升权限来安装 certifi

sudo /Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11 -m pip install --upgrade certifi

系统会提示你输入管理员密码,输入密码后继续安装。

2. 更新 pip

更新 pip 版本可能会有助于解决问题。使用以下命令来更新 pip

sudo /Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11 -m pip install --upgrade pip

然后再尝试升级 certifi

sudo /Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11 -m pip install --upgrade certifi

3. 验证安装

成功安装 certifi 后,运行以下命令验证安装是否成功:

/Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11 -m pip show certifi

4. 重新运行证书安装脚本

再次运行证书安装脚本:

/Applications/Python\ 3.11/Install\ Certificates.command

如果上述步骤完成后没有错误,请重新运行你的 pdf_extract.py 脚本:

python pdf_extract.py --pdf demo/demo1.pdf

这样应该能够解决证书和权限相关的问题。如果仍有其他问题,请告诉我详细信息。

@bookandlover Thank you for your feedback. We will update the document so that other users can solve similar problems.

在PDF文件提取中遇到无法启动MPS的问题,需要强制退回到CPU执行。很可能的原因是特定版本的 PyTorch 和 detectron2 之间存在兼容性问题,尝试降级或升级它们以解决问题。能否给出正确的版本号呢?我是M2 ULTRA的芯片。下面是一段示意代码。非常感谢。

pip install torch==1.12.1 torchvision==0.13.1
pip install detectron2==0.6 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.12/index.html

在PDF文件提取中遇到无法启动MPS的问题,需要强制退回到CPU执行。很可能的原因是特定版本的 PyTorch 和 detectron2 之间存在兼容性问题,尝试降级或升级它们以解决问题。能否给出正确的版本号呢?我是M2 ULTRA的芯片。下面是一段示意代码。非常感谢。

pip install torch==1.12.1 torchvision==0.13.1

pip install detectron2==0.6 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.12/index.html

pip install torch==2.3.1 torchvision==0.18.1 torchtext==0.18.0

pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/

for python 3.10 it works good

Error: command buffer exited with error status.
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Internal Error (0000000e:Internal Error)
<AGXG14XFamilyCommandBuffer: 0x37ba4f2c0>
label =
device = <AGXG14SDevice: 0x30888a600>
name = Apple M2 Pro
commandQueue = <AGXG14XFamilyCommandQueue: 0x32128f200>
label =
device = <AGXG14SDevice: 0x30888a600>
name = Apple M2 Pro
retainedReferences = 1
Error: command buffer exited with error status.
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Internal Error (0000000e:Internal Error)
<AGXG14XFamilyCommandBuffer: 0x37ba4ed50>
label =
device = <AGXG14SDevice: 0x30888a600>
name = Apple M2 Pro
commandQueue = <AGXG14XFamilyCommandQueue: 0x32128f200>
label =
device = <AGXG14SDevice: 0x30888a600>
name = Apple M2 Pro
retainedReferences = 1
Error: command buffer exited with error status.
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Internal Error (0000000e:Internal Error)
<AGXG14XFamilyCommandBuffer: 0x37ba50af0>
label =
device = <AGXG14SDevice: 0x30888a600>
name = Apple M2 Pro
commandQueue = <AGXG14XFamilyCommandQueue: 0x32128f200>
label =
device = <AGXG14SDevice: 0x30888a600>
name = Apple M2 Pro
retainedReferences = 1
Error: command buffer exited with error status.
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Internal Error (0000000e:Internal Error)
<AGXG14XFamilyCommandBuffer: 0x37ba4c5d0>
label =
device = <AGXG14SDevice: 0x30888a600>
name = Apple M2 Pro
commandQueue = <AGXG14XFamilyCommandQueue: 0x32128f200>
label =
device = <AGXG14SDevice: 0x30888a600>
name = Apple M2 Pro
retainedReferences = 1
Traceback (most recent call last):
File "/Users/liuchuang/Desktop/engram/PDF-Extract-Kit/pdf_extract.py", line 123, in
layout_res = layout_model(image, ignore_catids=[])
File "/Users/liuchuang/Desktop/engram/PDF-Extract-Kit/modules/layoutlmv3/model_init.py", line 124, in call
outputs = self.predictor(image)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/engine/defaults.py", line 319, in call
predictions = self.model([inputs])[0]
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/liuchuang/Desktop/engram/PDF-Extract-Kit/modules/layoutlmv3/rcnn_vl.py", line 55, in forward
return self.inference(batched_inputs)
File "/Users/liuchuang/Desktop/engram/PDF-Extract-Kit/modules/layoutlmv3/rcnn_vl.py", line 122, in inference
results, _ = self.roi_heads(images, features, proposals, None)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/roi_heads/cascade_rcnn.py", line 150, in forward
pred_instances = self.forward_with_given_boxes(features, pred_instances)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/roi_heads/roi_heads.py", line 776, in forward_with_given_boxes
instances = self._forward_mask(features, instances)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/roi_heads/roi_heads.py", line 843, in _forward_mask
features = self.mask_pooler(features, boxes)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/poolers.py", line 243, in forward
pooler_fmt_boxes = convert_boxes_to_pooler_format(box_lists)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/poolers.py", line 98, in convert_boxes_to_pooler_format
return _convert_boxes_to_pooler_format(boxes, sizes)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/jit/_trace.py", line 1254, in wrapper
return fn(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/poolers.py", line 66, in _convert_boxes_to_pooler_format
indices = torch.repeat_interleave(
RuntimeError: Expected repeatBuffer && cumsumBuffer && resultBuffer to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

PyTorch version 2.3.1
detectron2 version 0.6
用cpu 可以正常运行 mps 无法运行