woolen-sheep/md2report

无法运行,提示/app/filters/general.py: createProcess: posix_spawnp: invalid argument (Exec format error)

Closed this issue · 5 comments

构建docker后进入docker,运行:

python md2report.py -i test/test_case/5.2数据结构实验报告.md

报错:

Error running filter /app/filters/general.py:
/app/filters/general.py: createProcess: posix_spawnp: invalid argument (Exec format error)
Traceback (most recent call last):
  File "/app/md2report.py", line 126, in <module>
    convert_md_to_docx(conf)
  File "/app/md2report.py", line 86, in convert_md_to_docx
    handler_map[h](str(output_path.absolute()))
  File "/app/docx_handler/hust.py", line 102, in process_hust_docx
    doc: TDocument = Document(filename)
                     ^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/docx/api.py", line 25, in Document
    document_part = Package.open(docx).main_document_part
                    ^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/docx/opc/package.py", line 128, in open
    pkg_reader = PackageReader.from_file(pkg_file)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/docx/opc/pkgreader.py", line 32, in from_file
    phys_reader = PhysPkgReader(pkg_file)
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/docx/opc/phys_pkg.py", line 30, in __new__
    raise PackageNotFoundError(
docx.opc.exceptions.PackageNotFoundError: Package not found at '/app/output.docx'

已尝试了release中的各个版本文件,除docker外,在wsl以及Arch linux中尝试部署CLI并运行也是同样报错:

general.py: createProcess: posix_spawnp: invalid argument (Exec format error)

初步定位,发现reference-docs中的HUST.docx文件以及一些通过git lfs分享的文件已破损。在原先的部署中忘记使用git lfs进行clone。。。。。。后续使用docker时直接从release页面下载源码,但这些文件同样是已破损的。

重新使用git lfs clone项目部署后,运行仍有同样报错。已使用poetry创建环境,pandoc版本为:

(md2report-py3.10) alex@DESKTOP-UCC402E:/mnt/d/WorkSpace/GitProjects/md2report/backend$ pandoc -v
pandoc 2.19.2
Compiled with pandoc-types 1.22.2.1, texmath 0.12.5.2, skylighting 0.13,
citeproc 0.8.0.1, ipynb 0.2, hslua 2.2.1
Scripting engine: Lua 5.4
User data directory: /home/alex/.local/share/pandoc
Copyright (C) 2006-2022 John MacFarlane. Web:  https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.

确定是pandoc命令问题,单独使用脚本所构建的命令:

pandoc -s --toc /mnt/d/WorkSpace/GitProjects/md2report/backend/test/test_case/5.2数据结构实验报告.md.validated.md --reference-doc /mnt/d/WorkSpace/GitProjects/md2report/backend/reference-docs/HUST.docx --filter /mnt/d/WorkSpace/GitProjects/md2report/backend/filters/general.py -o /mnt/d/WorkSpace/GitProjects/md2report/backend/output.docx

报错:

Error running filter /mnt/d/WorkSpace/GitProjects/md2report/backend/filters/general.py:
/mnt/d/WorkSpace/GitProjects/md2report/backend/filters/general.py: createProcess: posix_spawnp: invalid argument (Exec format error)

去除--filter选项后,pandoc可以正常运行,但是自然缺少了脚本后续处理工作。

md2report.py76-77行关于--fillter的部分注释掉,脚本可以正常运行,目测转出来的output.docx基本正常。目前发现表格的标题标注存在问题,表格和图片的标号存在问题。

此次部署过程中使用了git lfsclone了当前仓库源码,pandoc版本大于2.11,python环境为使用poetry安装构建,应当符合README.md中的项目部署需求。因此我认为项目filter文件应当还是存在问题,导致脚本无法正常运行。

此外,建议README.md中还是要强调一下git lfs的安装使用,需要部署就一定要利用git lfs clone仓库,否则一些大文件是破损的。(我前几次部署都忽略了这一点:<

需要部署就一定要利用git lfs clone仓库

git lfs install之后,git clone等价于git lfs clone,无需执行其它命令:

➜  project git clone https://github.com/woolen-sheep/md2report.git
Cloning into 'md2report'...
remote: Enumerating objects: 383, done.
remote: Counting objects: 100% (32/32), done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 383 (delta 13), reused 15 (delta 9), pack-reused 351
Receiving objects: 100% (383/383), 2.41 MiB | 2.03 MiB/s, done.
Resolving deltas: 100% (136/136), done.
Filtering content: 100% (25/25), 1.25 MiB | 144.00 KiB/s, done.
➜  project cd md2report/backend/reference-docs
➜  reference-docs git:(master) stat HUST.docx
  File: HUST.docx
  Size: 21647           Blocks: 48         IO Block: 4096   regular file
Device: 259,2   Inode: 19679330    Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-11-28 23:46:27.572405622 +0800
Modify: 2022-11-28 23:46:27.572405622 +0800
Change: 2022-11-28 23:46:27.572405622 +0800
 Birth: 2022-11-28 23:46:27.572405622 +0800

lfs 是小孩子不懂事用着玩的, 考虑之后移除。

能否尝试 :

docker pull woolensheep/md2report:v0.1.2
docker run --name md2report -d woolensheep/md2report:v0.1.2
docker exec -it md2report bash
cd /app
python md2report.py -i test/test_case/5.2数据结构实验报告.md

我的开发环境也是Arch,pandoc版本与你相同,未能复现。
release的Github Action也是clone之后build的,生成的image工作正常,因此仓库中的文件以及Dockerfile也是没有问题的。

使用此命令构建docker使用无问题:

docker pull woolensheep/md2report:v0.1.2
docker run --name md2report -d woolensheep/md2report:v0.1.2
docker exec -it md2report bash
cd /app
python md2report.py -i test/test_case/5.2数据结构实验报告.md

有点奇怪,之前直接用docker build构建的docker跑起来就会是我先前的报错信息,并且我刚刚重新在同一仓库使用dockerfile构建运行,稳定复现报错。我使用docker的平台为windwos10 专业工作站版 22H2,构建并运行测试命令如下:

docker build -t md2report .
docker run --name md2report_test -d md2report
docker exec -it md2report_test bash

# 进入docker bash
root@5f5386bd5b62:/app# python md2report.py -i test/test_case/5.2数据结构实验报告.md
Please give me star if this application helps you!
如果这个应用有帮助到你,请给我点一个 star!
https://github.com/woolen-sheep/md2report
Error running filter /app/filters/general.py:
/app/filters/general.py: createProcess: posix_spawnp: invalid argument (Exec format error)
Traceback (most recent call last):
  File "/app/md2report.py", line 141, in <module>
    convert_md_to_docx(conf)
  File "/app/md2report.py", line 91, in convert_md_to_docx
    handler_map[handler_name](str(output_path.absolute()), **params)
  File "/app/docx_handler/hust.py", line 102, in process_hust_docx
    doc: TDocument = Document(filename)
                     ^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/docx/api.py", line 25, in Document
    document_part = Package.open(docx).main_document_part
                    ^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/docx/opc/package.py", line 128, in open
    pkg_reader = PackageReader.from_file(pkg_file)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/docx/opc/pkgreader.py", line 32, in from_file
    phys_reader = PhysPkgReader(pkg_file)
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/docx/opc/phys_pkg.py", line 30, in __new__
    raise PackageNotFoundError(
docx.opc.exceptions.PackageNotFoundError: Package not found at '/app/output.docx'

另外,我在后来使用git lfs clone后的部署没有在我的arch下运行,而是在wsl 1进行的部署测试(毕竟最后的md和word都在win上看hhh)明天有时间我会试着直接在arch下部署,看看是否会有同样报错。

太久无回复,先关了,有问题再开。