testing fails on unix where python is not available (but only python3)

Question

testing fails on unix where python is not available (but only python3)

kloczek opened this issue 2 years ago · 27 comments

I'm trying to package your module as an rpm package. So I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

python3 -sBm build -w --no-isolation
because I'm calling build with --no-isolation I'm using during all processes only locally installed modules
install .whl file in </install/prefix>
run pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

Here is pytest output:

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.8-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.8-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.13, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.8
collected 0 items

========================================================================== no tests ran in 0.01s ===========================================================================

Answer 1 · 2022-05-11T14:11:10.000Z

TOtally honest here, I don't know much about rpm packages.

How do you get the new release? Through pip right?
Do you clone the repo, or use the official release through pypi?

Answer 2 · 2022-05-11T16:44:48.000Z

That issue has nothing to do with rpm.
You can reproduce that using oprocedure which I've described.
Just plese run pytest.

Answer 3 · 2022-05-11T19:18:38.000Z

Ohh. The reason for it, is because examples, documentation files and tests has been removed from the pypi release from version 1.8, because of some conflicting names, when installing

Answer 4 · 2022-05-11T20:02:43.000Z

I'm not using pypu sdist but tar atogenerated from git tag. https://github.com/NicklasTegner/pypandoc/archive/refs/tags/v1.8.tar.gz

Answer 5 · 2022-05-11T20:59:42.000Z

It's because when building, from the pyproject.toml file.
In 1.8 we have removed the test and other files.

I would suggest download and extracting the tar.gz, then running the tests, and lastly creating the wheel

Answer 6 · 2022-05-11T21:03:39.000Z

I see insise tar ball tests.py.

Answer 7 · 2022-05-12T00:06:52.000Z

yes but they aren't included when you build. When the whl gets produced they aren't included.

Answer 8 · 2022-05-12T08:45:12.000Z

You can check what is inside autogenerated from git tag tar ball https://github.com/NicklasTegner/pypandoc/tree/v1.8

Answer 9 · 2022-05-12T11:48:10.000Z

I know, and in the tarball they are, but my guess is, that when you run the build command, they aren't included, just like when I run python setup.py sdist, because of a change for version 1.8.

My suggestion would be to run pytest before building.

Answer 10 · 2022-09-28T17:11:12.000Z

Just tested 1.9 and looks like new two units are failing

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra tests.py --deselect tests.py::TestPypandoc::test_pdf_conversion
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.14, pytest-7.1.3, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9
collected 40 items / 1 deselected / 39 selected

tests.py ..........................FF...........                                                                                                                     [100%]

================================================================================= FAILURES =================================================================================
_____________________________________________________________ TestPypandoc.test_conversion_with_mixed_filters ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_mixed_filters>

    def test_conversion_with_mixed_filters(self):
        markdown_source = "-0-"

        lua = """\
        function Para(elem)
            return pandoc.Para(elem.content .. {{"{0}-"}})
        end
        """
        lua = textwrap.dedent(lua)

        python = """\
        #!/usr/bin/env python

        from pandocfilters import toJSONFilter, Para, Str

        def func(key, value, format, meta):
            if key == "Para":
                return Para(value + [Str("{0}-")])

        if __name__ == "__main__":
            toJSONFilter(func)

        """
        python = textwrap.dedent(python)

        with closed_tempfile(".lua", lua.format(1)) as temp1, closed_tempfile(".py", python.format(2)) as temp2:
            with closed_tempfile(".lua", lua.format(3)) as temp3, closed_tempfile(".py", python.format(4)) as temp4:
>               output = pypandoc.convert_text(
                    markdown_source, to="html", format="md", outputfile=None, filters=[temp1, temp2, temp3, temp4]
                ).strip()

tests.py:381:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:93: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'-0-', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None
filters = ['/tmp/tmpsu9lufkd.lua', '/tmp/tmpgux03sxh.py', '/tmp/tmp96de3ep5.lua', '/tmp/tmphc5bl3mo.py'], verify_format = True, sandbox = True, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=True, cworkdir=None):

        _check_log_handler()
        _ensure_pandoc_path()

        if verify_format:
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                args.append("--sandbox")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpgux03sxh.py:
E           Could not find executable python

pypandoc/__init__.py:418: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
_____________________________________________________________ TestPypandoc.test_conversion_with_python_filter ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_python_filter>

    def test_conversion_with_python_filter(self):
        markdown_source = "**Here comes the content.**"
        python_source = '''\
        #!/usr/bin/env python

        """
        Pandoc filter to convert all regular text to uppercase.
        Code, link URLs, etc. are not affected.
        """

        from pandocfilters import toJSONFilter, Str

        def caps(key, value, format, meta):
            if key == 'Str':
                return Str(value.upper())

        if __name__ == "__main__":
            toJSONFilter(caps)
        '''
        python_source = textwrap.dedent(python_source)
        with closed_tempfile(".py", python_source) as tempfile:
>           output = pypandoc.convert_text(
                markdown_source, to='html', format='md', outputfile=None, filters=tempfile
            ).strip()

tests.py:332:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:93: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'**Here comes the content.**', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None, filters = ['/tmp/tmpk2dzvkz1.py']
verify_format = True, sandbox = True, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=True, cworkdir=None):

        _check_log_handler()
        _ensure_pandoc_path()

        if verify_format:
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                args.append("--sandbox")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpk2dzvkz1.py:
E           Could not find executable python

pypandoc/__init__.py:418: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
============================================================================= warnings summary =============================================================================
pypandoc/pandoc_download.py:62
  /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9/pypandoc/pandoc_download.py:62: DeprecationWarning: invalid escape sequence \.
    regex = re.compile(r"/jgm/pandoc/releases/download/.*(?:"+processor_architecture+"|x86|mac).*\.(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================= short test summary info ==========================================================================
FAILED tests.py::TestPypandoc::test_conversion_with_mixed_filters - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpgux03sxh...
FAILED tests.py::TestPypandoc::test_conversion_with_python_filter - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpk2dzvkz1...
========================================================== 2 failed, 37 passed, 1 deselected, 1 warning in 4.90s ===========================================================

Answer 11 · 2022-09-28T17:12:09.000Z

I know, and in the tarball they are, but my guess is, that when you run the build command, they aren't included, just like when I run python setup.py sdist, because of a change for version 1.8.

My suggestion would be to run pytest before building.

On typical rpm package build test suite is always executed after builds and install.

Answer 12 · 2022-09-28T17:24:15.000Z

So for your errors, it seems that in both instances, it can't find the "python" executable when trying to use a python filter.
What do you say would be the best solutions? Trying "python3" before "python", since we actually want py3, or the other way around, where we try python3, if regular "python" executable couldn't be found?

Answer 13 · 2022-09-29T04:42:28.000Z

So for your errors, it seems that in both instances, it can't find the "python" executable when trying to use a python filter.
What do you say would be the best solutions?

Instead hardcoding "python" executable name use sys.executable.

Answer 14 · 2022-09-29T06:39:54.000Z

Instead hardcoding "python" executable name use sys.executable.

We are not hardcoding the name per say. THe error is from the shibang lines when we test with a python filter.

Answer 15 · 2022-09-29T12:57:12.000Z

We are not hardcoding the name per say. THe error is from the shibang lines when we test with a python filter.

Than instead hardcode python executable in shebang line execute script as sys.executable param.

Answer 16 · 2022-10-01T06:55:58.000Z

I've added tree commits to my build procedure:

Patch:          %{VCS}/commit/b5565358.patch#/%{name}-Updated-readme-with-correct-batches.patch
Patch:          %{VCS}/commit/3e7676dd.patch#/%{name}-Fixed-sort-files-before-processing-292-301.patch
Patch:          %{VCS}/commit/b2738b45.patch#/%{name}-Fixes-test-cases-that-uses-python-while-only-python3.patch

and looks like issue still is present

tests.py ..........................FF........... [100%]

================================================================================= FAILURES =================================================================================
_____________________________________________________________ TestPypandoc.test_conversion_with_mixed_filters ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_mixed_filters>

def test_conversion_with_mixed_filters(self):
    markdown_source = "-0-"

    lua = """\
    function Para(elem)
        return pandoc.Para(elem.content .. {{"{0}-"}})
    end
    """
    lua = textwrap.dedent(lua)

    python = """\
    #!{0}

    from pandocfilters import toJSONFilter, Para, Str

    def func(key, value, format, meta):
        if key == "Para":
            return Para(value + [Str("{0}-")])

    if __name__ == "__main__":
        toJSONFilter(func)

    """
    python = textwrap.dedent(python)
    python.format(sys.executable)

    with closed_tempfile(".lua", lua.format(1)) as temp1, closed_tempfile(".py", python.format(2)) as temp2:
        with closed_tempfile(".lua", lua.format(3)) as temp3, closed_tempfile(".py", python.format(4)) as temp4:

          output = pypandoc.convert_text(

                markdown_source, to="html", format="md", outputfile=None, filters=[temp1, temp2, temp3, temp4]
            ).strip()

tests.py:384:

pypandoc/init.py:93: in convert_text
return _convert_input(source, format, 'string', to, extra_args=extra_args,

source = b'-0-', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None
filters = ['/tmp/tmpbv7qc_dw.lua', '/tmp/tmpi8dvhe90.py', '/tmp/tmpf8udfv4c.lua', '/tmp/tmpb_8hsrr8.py'], verify_format = True, sandbox = True, cworkdir = None

def _convert_input(source, format, input_type, to, extra_args=(),
                   outputfile=None, filters=None, verify_format=True,
                   sandbox=True, cworkdir=None):

    _check_log_handler()
    _ensure_pandoc_path()

    if verify_format:
        format, to = _validate_formats(format, to, outputfile)
    else:
        format = normalize_format(format)
        to = normalize_format(to)

    string_input = input_type == 'string'
    if not string_input:
        if isinstance(source, str):
            input_file = [source]
        else:
            input_file = source
    else:
        input_file = []

    input_file = sorted(input_file)
    args = [__pandoc_path, '--from=' + format]

    args.append('--to=' + to)

    args += input_file

    if outputfile:
        args.append("--output=" + str(outputfile))

    if sandbox:
        if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
            args.append("--sandbox")

    args.extend(extra_args)

    # adds the proper filter syntax for each item in the filters list
    if filters is not None:
        if isinstance(filters, string_types):
            filters = filters.split()
        f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
        args.extend(f)

    # To get access to pandoc-citeproc when we use a included copy of pandoc,
    # we need to add the pypandoc/files dir to the PATH
    new_env = os.environ.copy()
    files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
    new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
    creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

    old_wd = os.getcwd()
    if cworkdir and old_wd != cworkdir:
        os.chdir(cworkdir)

    p = subprocess.Popen(
        args,
        stdin=subprocess.PIPE if string_input else None,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        env=new_env,
        creationflags=creation_flag)

    if cworkdir is not None:
        os.chdir(old_wd)

    # something else than 'None' indicates that the process already terminated
    if not (p.returncode is None):
        raise RuntimeError(
            'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                           p.stderr.read())
        )

    if string_input:
        try:
            source = cast_bytes(source, encoding='utf-8')
        except (UnicodeDecodeError, UnicodeEncodeError):
            # assume that it is already a utf-8 encoded string
            pass
    try:
        stdout, stderr = p.communicate(source if string_input else None)
    except OSError:
        # this is happening only on Py2.6 when pandoc dies before reading all
        # the input. We treat that the same as when we exit with an error...
        raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

    try:
        stdout = stdout.decode('utf-8')
    except UnicodeDecodeError:
        # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
        raise RuntimeError('Pandoc output was not utf-8.')

    try:
        stderr = stderr.decode('utf-8')
    except UnicodeDecodeError:
        # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
        raise RuntimeError('Pandoc output was not utf-8.')

    # check that pandoc returned successfully
    if p.returncode != 0:

      raise RuntimeError(

            'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
        )

E RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpi8dvhe90.py:
E Could not find executable python

pypandoc/init.py:420: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
_____________________________________________________________ TestPypandoc.test_conversion_with_python_filter ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_python_filter>

def test_conversion_with_python_filter(self):
    markdown_source = "**Here comes the content.**"
    python_source = '''\
    #!{0}

    """
    Pandoc filter to convert all regular text to uppercase.
    Code, link URLs, etc. are not affected.
    """

    from pandocfilters import toJSONFilter, Str

    def caps(key, value, format, meta):
        if key == 'Str':
            return Str(value.upper())

    if __name__ == "__main__":
        toJSONFilter(caps)
    '''
    python_source = textwrap.dedent(python_source)
    python_source.format(sys.executable)

    with closed_tempfile(".py", python_source) as tempfile:

      output = pypandoc.convert_text(

            markdown_source, to='html', format='md', outputfile=None, filters=tempfile
        ).strip()

tests.py:334:

pypandoc/init.py:93: in convert_text
return _convert_input(source, format, 'string', to, extra_args=extra_args,

source = b'Here comes the content.', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None, filters = ['/tmp/tmp2o_r5jt_.py']
verify_format = True, sandbox = True, cworkdir = None

def _convert_input(source, format, input_type, to, extra_args=(),
                   outputfile=None, filters=None, verify_format=True,
                   sandbox=True, cworkdir=None):

    _check_log_handler()
    _ensure_pandoc_path()

    if verify_format:
        format, to = _validate_formats(format, to, outputfile)
    else:
        format = normalize_format(format)
        to = normalize_format(to)

    string_input = input_type == 'string'
    if not string_input:
        if isinstance(source, str):
            input_file = [source]
        else:
            input_file = source
    else:
        input_file = []

    input_file = sorted(input_file)
    args = [__pandoc_path, '--from=' + format]

    args.append('--to=' + to)

    args += input_file

    if outputfile:
        args.append("--output=" + str(outputfile))

    if sandbox:
        if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
            args.append("--sandbox")

    args.extend(extra_args)

    # adds the proper filter syntax for each item in the filters list
    if filters is not None:
        if isinstance(filters, string_types):
            filters = filters.split()
        f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
        args.extend(f)

    # To get access to pandoc-citeproc when we use a included copy of pandoc,
    # we need to add the pypandoc/files dir to the PATH
    new_env = os.environ.copy()
    files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
    new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
    creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

    old_wd = os.getcwd()
    if cworkdir and old_wd != cworkdir:
        os.chdir(cworkdir)

    p = subprocess.Popen(
        args,
        stdin=subprocess.PIPE if string_input else None,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        env=new_env,
        creationflags=creation_flag)

    if cworkdir is not None:
        os.chdir(old_wd)

    # something else than 'None' indicates that the process already terminated
    if not (p.returncode is None):
        raise RuntimeError(
            'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                           p.stderr.read())
        )

    if string_input:
        try:
            source = cast_bytes(source, encoding='utf-8')
        except (UnicodeDecodeError, UnicodeEncodeError):
            # assume that it is already a utf-8 encoded string
            pass
    try:
        stdout, stderr = p.communicate(source if string_input else None)
    except OSError:
        # this is happening only on Py2.6 when pandoc dies before reading all
        # the input. We treat that the same as when we exit with an error...
        raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

    try:
        stdout = stdout.decode('utf-8')
    except UnicodeDecodeError:
        # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
        raise RuntimeError('Pandoc output was not utf-8.')

    try:
        stderr = stderr.decode('utf-8')
    except UnicodeDecodeError:
        # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
        raise RuntimeError('Pandoc output was not utf-8.')

    # check that pandoc returned successfully
    if p.returncode != 0:

      raise RuntimeError(

            'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
        )

E RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmp2o_r5jt_.py:
E Could not find executable python

pypandoc/init.py:420: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
============================================================================= warnings summary =============================================================================
pypandoc/pandoc_download.py:62
/home/tkloczko/rpmbuild/BUILD/pypandoc-1.9/pypandoc/pandoc_download.py:62: DeprecationWarning: invalid escape sequence .
regex = re.compile(r"/jgm/pandoc/releases/download/.(?:"+processor_architecture+"|x86|mac)..(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================= short test summary info ==========================================================================
FAILED tests.py::TestPypandoc::test_conversion_with_mixed_filters - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpi8dvhe90...
FAILED tests.py::TestPypandoc::test_conversion_with_python_filter - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmp2o_r5jt_...
========================================================== 2 failed, 37 passed, 1 deselected, 1 warning in 4.93s ===========================================================

</details>

Answer 17 · 2022-10-01T06:57:42.000Z

Additionally after --deselect failing units I see some warnings:

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra tests.py --deselect tests.py::TestPypandoc::test_pdf_conversion --deselect tests.py::TestPypandoc::test_conversion_with_mixed_filters --deselect tests.py::TestPypandoc::test_conversion_with_python_filter
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.14, pytest-7.1.3, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9
collected 40 items / 3 deselected / 37 selected

tests.py .....................................                                                                                                                       [100%]

============================================================================= warnings summary =============================================================================
pypandoc/pandoc_download.py:62
  /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9/pypandoc/pandoc_download.py:62: DeprecationWarning: invalid escape sequence \.
    regex = re.compile(r"/jgm/pandoc/releases/download/.*(?:"+processor_architecture+"|x86|mac).*\.(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================== 37 passed, 3 deselected, 1 warning in 4.33s ================================================================

Answer 18 · 2022-10-01T09:11:45.000Z

@kloczek can you try the latest development snapshot, that should fix the failing test cases.

Answer 19 · 2022-10-01T11:28:33.000Z

After replace last patch with:

Patch:          %{VCS}/commit/b5565358.patch#/%{name}-Updated-readme-with-correct-batches.patch
Patch:          %{VCS}/commit/3e7676dd.patch#/%{name}-Fixed-sort-files-before-processing-292-301.patch
Patch:          https://github.com/JessicaTegner/pypandoc/commit/b2738b45.patch#/%{name}-Fixes-test-cases-that-uses-python-while-only-python3-is-available.patch

pytest still fails ..

+ cd pypandoc-1.9
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ /usr/bin/cat /home/tkloczko/rpmbuild/SOURCES/python-pypandoc-Updated-readme-with-correct-batches.patch
+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f
+ /usr/bin/cat /home/tkloczko/rpmbuild/SOURCES/python-pypandoc-Fixed-sort-files-before-processing-292-301.patch
+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f
+ /usr/bin/cat /home/tkloczko/rpmbuild/SOURCES/python-pypandoc-Fixes-test-cases-that-uses-python-while-only-python3-is-available.patch
+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f

[..]

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra tests.py --deselect tests.py::TestPypandoc::test_pdf_conversion
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.14, pytest-7.1.3, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9
collected 40 items / 1 deselected / 39 selected

tests.py ..........................FF...........                                                                                                                     [100%]

================================================================================= FAILURES =================================================================================
_____________________________________________________________ TestPypandoc.test_conversion_with_mixed_filters ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_mixed_filters>

    def test_conversion_with_mixed_filters(self):
        markdown_source = "-0-"

        lua = """\
        function Para(elem)
            return pandoc.Para(elem.content .. {{"{0}-"}})
        end
        """
        lua = textwrap.dedent(lua)

        python = """\
        #!{0}

        from pandocfilters import toJSONFilter, Para, Str

        def func(key, value, format, meta):
            if key == "Para":
                return Para(value + [Str("{0}-")])

        if __name__ == "__main__":
            toJSONFilter(func)

        """
        python = textwrap.dedent(python)
        python.format(sys.executable)

        with closed_tempfile(".lua", lua.format(1)) as temp1, closed_tempfile(".py", python.format(2)) as temp2:
            with closed_tempfile(".lua", lua.format(3)) as temp3, closed_tempfile(".py", python.format(4)) as temp4:
>               output = pypandoc.convert_text(
                    markdown_source, to="html", format="md", outputfile=None, filters=[temp1, temp2, temp3, temp4]
                ).strip()

tests.py:384:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:93: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'-0-', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None
filters = ['/tmp/tmpwr8zvwco.lua', '/tmp/tmpiljf6dd6.py', '/tmp/tmpdu_fqeof.lua', '/tmp/tmpwj3gm4v6.py'], verify_format = True, sandbox = True, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=True, cworkdir=None):

        _check_log_handler()
        _ensure_pandoc_path()

        if verify_format:
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                args.append("--sandbox")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpiljf6dd6.py:
E           Could not find executable python

pypandoc/__init__.py:420: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
_____________________________________________________________ TestPypandoc.test_conversion_with_python_filter ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_python_filter>

    def test_conversion_with_python_filter(self):
        markdown_source = "**Here comes the content.**"
        python_source = '''\
        #!{0}

        """
        Pandoc filter to convert all regular text to uppercase.
        Code, link URLs, etc. are not affected.
        """

        from pandocfilters import toJSONFilter, Str

        def caps(key, value, format, meta):
            if key == 'Str':
                return Str(value.upper())

        if __name__ == "__main__":
            toJSONFilter(caps)
        '''
        python_source = textwrap.dedent(python_source)
        python_source.format(sys.executable)

        with closed_tempfile(".py", python_source) as tempfile:
>           output = pypandoc.convert_text(
                markdown_source, to='html', format='md', outputfile=None, filters=tempfile
            ).strip()

tests.py:334:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:93: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'**Here comes the content.**', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None, filters = ['/tmp/tmpd450lw4k.py']
verify_format = True, sandbox = True, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=True, cworkdir=None):

        _check_log_handler()
        _ensure_pandoc_path()

        if verify_format:
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                args.append("--sandbox")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpd450lw4k.py:
E           Could not find executable python

pypandoc/__init__.py:420: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
============================================================================= warnings summary =============================================================================
pypandoc/pandoc_download.py:62
  /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9/pypandoc/pandoc_download.py:62: DeprecationWarning: invalid escape sequence \.
    regex = re.compile(r"/jgm/pandoc/releases/download/.*(?:"+processor_architecture+"|x86|mac).*\.(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================= short test summary info ==========================================================================
FAILED tests.py::TestPypandoc::test_conversion_with_mixed_filters - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpiljf6dd6...
FAILED tests.py::TestPypandoc::test_conversion_with_python_filter - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpd450lw4k...
========================================================== 2 failed, 37 passed, 1 deselected, 1 warning in 4.89s ===========================================================

Answer 20 · 2022-10-02T02:04:25.000Z

well we are using sys.executable to run the python tests now, so I don't see how it could fail...

Answer 21 · 2023-03-05T09:36:06.000Z

Just retested 1.11 and still I see three units failing

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.11-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.11-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra -m 'not network' tests.py --deselect tests.py::TestPypandoc::test_pdf_conversion
==================================================================================== test session starts ====================================================================================
platform linux -- Python 3.8.16, pytest-7.2.2, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.11
collected 41 items / 1 deselected / 40 selected

tests.py .......................F...FF...........                                                                                                                                     [100%]

========================================================================================= FAILURES ==========================================================================================
_______________________________________________________________________ TestPypandoc.test_conversion_with_data_files ________________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_data_files>

        def test_conversion_with_data_files(self):
            # remove our test.docx file from our test_data dir if it already exosts
            test_data_dir = os.path.join(os.path.dirname(__file__), 'test_data')
            test_docx_file = os.path.join(test_data_dir, 'test.docx')
            if os.path.exists(test_docx_file):
                os.remove(test_docx_file)
>           result = pypandoc.convert_file(
        os.path.join(test_data_dir, 'index.html'),
        to='docx',
        format='html',
        outputfile=test_docx_file,
        sandbox=True,
    )

tests.py:240:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:168: in convert_file
    return _convert_input(discovered_source_files, format, 'path', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = '/home/tkloczko/rpmbuild/BUILD/pypandoc-1.11/test_data/index.html', format = 'html', input_type = 'path', to = 'docx', extra_args = ()
outputfile = '/home/tkloczko/rpmbuild/BUILD/pypandoc-1.11/test_data/test.docx', filters = None, verify_format = True, sandbox = True, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "97" during conversion: Could not find data file data/data/docx/[Content_Types].xml

pypandoc/__init__.py:426: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
______________________________________________________________________ TestPypandoc.test_conversion_with_mixed_filters ______________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_mixed_filters>

    def test_conversion_with_mixed_filters(self):
        markdown_source = "-0-"

        lua = """\
        function Para(elem)
            return pandoc.Para(elem.content .. {{"{0}-"}})
        end
        """
        lua = textwrap.dedent(lua)

        python = """\
        #!{0}

        from pandocfilters import toJSONFilter, Para, Str

        def func(key, value, format, meta):
            if key == "Para":
                return Para(value + [Str("{0}-")])

        if __name__ == "__main__":
            toJSONFilter(func)

        """
        python = textwrap.dedent(python)
        python.format(sys.executable)

        with closed_tempfile(".lua", lua.format(1)) as temp1, closed_tempfile(".py", python.format(2)) as temp2:
            with closed_tempfile(".lua", lua.format(3)) as temp3, closed_tempfile(".py", python.format(4)) as temp4:
>               output = pypandoc.convert_text(
                    markdown_source, to="html", format="md", outputfile=None, filters=[temp1, temp2, temp3, temp4]
                ).strip()

tests.py:403:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:91: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'-0-', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None
filters = ['/tmp/tmpdgw_df6w.lua', '/tmp/tmpbl813ywg.py', '/tmp/tmp85dsiv3y.lua', '/tmp/tmp7j1t2jod.py'], verify_format = True, sandbox = False, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpbl813ywg.py:
E           Could not find executable python

pypandoc/__init__.py:426: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
______________________________________________________________________ TestPypandoc.test_conversion_with_python_filter ______________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_python_filter>

    def test_conversion_with_python_filter(self):
        markdown_source = "**Here comes the content.**"
        python_source = '''\
        #!{0}

        """
        Pandoc filter to convert all regular text to uppercase.
        Code, link URLs, etc. are not affected.
        """

        from pandocfilters import toJSONFilter, Str

        def caps(key, value, format, meta):
            if key == 'Str':
                return Str(value.upper())

        if __name__ == "__main__":
            toJSONFilter(caps)
        '''
        python_source = textwrap.dedent(python_source)
        python_source.format(sys.executable)

        with closed_tempfile(".py", python_source) as tempfile:
>           output = pypandoc.convert_text(
                markdown_source, to='html', format='md', outputfile=None, filters=tempfile
            ).strip()

tests.py:353:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:91: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'**Here comes the content.**', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None, filters = ['/tmp/tmpcwer2aku.py'], verify_format = True
sandbox = False, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpcwer2aku.py:
E           Could not find executable python

pypandoc/__init__.py:426: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
===================================================================================== warnings summary ======================================================================================
pypandoc/pandoc_download.py:61
  /home/tkloczko/rpmbuild/BUILD/pypandoc-1.11/pypandoc/pandoc_download.py:61: DeprecationWarning: invalid escape sequence \.
    regex = re.compile(r"/jgm/pandoc/releases/download/.*(?:"+processor_architecture+"|x86|mac).*\.(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================== short test summary info ==================================================================================
FAILED tests.py::TestPypandoc::test_conversion_with_data_files - RuntimeError: Pandoc died with exitcode "97" during conversion: Could not find data file data/data/docx/[Content_Types].xml
FAILED tests.py::TestPypandoc::test_conversion_with_mixed_filters - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpbl813ywg.py:
FAILED tests.py::TestPypandoc::test_conversion_with_python_filter - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpcwer2aku.py:
=================================================================== 3 failed, 37 passed, 1 deselected, 1 warning in 6.46s ===================================================================

Answer 22 · 2023-03-05T15:23:39.000Z

@kloczek From the 2 last ones:

        python_source = '''\
        #!{0}

        """
        Pandoc filter to convert all regular text to uppercase.
        Code, link URLs, etc. are not affected.
        """

        from pandocfilters import toJSONFilter, Str

        def caps(key, value, format, meta):
            if key == 'Str':
                return Str(value.upper())

        if __name__ == "__main__":
            toJSONFilter(caps)
        '''
        python_source = textwrap.dedent(python_source)
        python_source.format(sys.executable)

We are setting the shebang line by using "sys.executable", so only reason why it can't run the python filters would be because either the "sys.executable" is incorrect, or is still set to regular python (somehow).

Can you try running something like the following, to check what the sys.executable is set to when running our tests?

import sys

print(sys.executable)

For the first one, the one about the data files. That's an error in the test case, where "sandbox" is specifically set to True, even though the default is now False. THat should be easy enough to fix, by just omitting the sandbox parameter all together

Answer 23 · 2023-03-07T02:41:56.000Z

[tkloczko@pers-jacek SPECS]$ python3
Python 3.8.16 (default, Jan 30 2023, 13:00:00)
[GCC 13.0.1 20230127 (Red Hat 13.0.1-0)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.executable)
/usr/bin/python3
>>>

Answer 24 · 2023-03-09T14:06:49.000Z

@kloczek can you test master now, after the work pr pr #328 the python ones should be fixed

Answer 25 · 2023-05-08T01:09:43.000Z

This issue should be able to be closed now. ping @kloczek

Answer 26 · 2024-04-11T15:40:28.000Z

Hmm .. just retested 1.13 and pytest still fails in 3 units 🤔

Here is pytest output:

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.13-4.fc37.x86_64/usr/lib64/python3.10/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.13-4.fc37.x86_64/usr/lib/python3.10/site-packages
+ /usr/bin/pytest -ra -m 'not network' tests.py --deselect tests.py::TestPypandoc::test_pdf_conversion
==================================================================================== test session starts ====================================================================================
platform linux -- Python 3.10.14, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.13
configfile: pyproject.toml
collected 41 items / 1 deselected / 40 selected

tests.py .......................F...FF...........                                                                                                                                     [100%]

========================================================================================= FAILURES ==========================================================================================
_______________________________________________________________________ TestPypandoc.test_conversion_with_data_files ________________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_data_files>

    def test_conversion_with_data_files(self):
        # remove our test.docx file from our test_data dir if it already exosts
        test_data_dir = os.path.join(os.path.dirname(__file__), 'test_data')
        test_docx_file = os.path.join(test_data_dir, 'test.docx')
        if os.path.exists(test_docx_file):
            os.remove(test_docx_file)
>       result = pypandoc.convert_file(
          os.path.join(test_data_dir, 'index.html'),
          to='docx',
          format='html',
          outputfile=test_docx_file,
          sandbox=True,
        )

tests.py:240:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:200: in convert_file
    return _convert_input(discovered_source_files, format, 'path', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = '/home/tkloczko/rpmbuild/BUILD/pypandoc-1.13/test_data/index.html', format = 'html', input_type = 'path', to = 'docx', extra_args = ()
outputfile = '/home/tkloczko/rpmbuild/BUILD/pypandoc-1.13/test_data/test.docx', filters = None, verify_format = True, sandbox = True
cworkdir = '/home/tkloczko/rpmbuild/BUILD/pypandoc-1.13'

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            if not (to in ["odt", "docx", "epub", "epub3", "pdf"] and outputfile == "-"):
                stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "97" during conversion: Could not find data file data/data/docx/[Content_Types].xml

pypandoc/__init__.py:467: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
______________________________________________________________________ TestPypandoc.test_conversion_with_mixed_filters ______________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_mixed_filters>

    def test_conversion_with_mixed_filters(self):
        markdown_source = "-0-"

        lua = """\
        function Para(elem)
            return pandoc.Para(elem.content .. {{"{0}-"}})
        end
        """
        lua = textwrap.dedent(lua)

        python = """\
        #!{0}

        from pandocfilters import toJSONFilter, Para, Str

        def func(key, value, format, meta):
            if key == "Para":
                return Para(value + [Str("{{0}}-")])

        if __name__ == "__main__":
            toJSONFilter(func)

        """
        python = textwrap.dedent(python)
        python = python.format(sys.executable)

        with closed_tempfile(".lua", lua.format(1)) as temp1, closed_tempfile(".py", python.format(2)) as temp2:
            os.chmod(temp2, 0o755)

            with closed_tempfile(".lua", lua.format(3)) as temp3, closed_tempfile(".py", python.format(4)) as temp4:
                os.chmod(temp4, 0o755)

>               output = pypandoc.convert_text(
                    markdown_source, to="html", format="md", outputfile=None, filters=[temp1, temp2, temp3, temp4]
                ).strip()

tests.py:408:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:92: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'-0-', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None
filters = ['/tmp/tmpjpqbkha4.lua', '/tmp/tmpslzo_5p7.py', '/tmp/tmpv3iwkln3.lua', '/tmp/tmppf4ooinv.py'], verify_format = True, sandbox = False, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            if not (to in ["odt", "docx", "epub", "epub3", "pdf"] and outputfile == "-"):
                stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Traceback (most recent call last):
E             File "/tmp/tmpslzo_5p7.py", line 3, in <module>
E               from pandocfilters import toJSONFilter, Para, Str
E           ModuleNotFoundError: No module named 'pandocfilters'
E           Error running filter /tmp/tmpslzo_5p7.py:
E           Filter returned error status 1

pypandoc/__init__.py:467: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
______________________________________________________________________ TestPypandoc.test_conversion_with_python_filter ______________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_python_filter>

    def test_conversion_with_python_filter(self):
        markdown_source = "**Here comes the content.**"
        python_source = '''\
        #!{0}

        """
        Pandoc filter to convert all regular text to uppercase.
        Code, link URLs, etc. are not affected.
        """

        from pandocfilters import toJSONFilter, Str

        def caps(key, value, format, meta):
            if key == 'Str':
                return Str(value.upper())

        if __name__ == "__main__":
            toJSONFilter(caps)
        '''
        python_source = textwrap.dedent(python_source)
        python_source = python_source.format(sys.executable)

        with closed_tempfile(".py", python_source) as tempfile:
            os.chmod(tempfile, 0o755)
>           output = pypandoc.convert_text(
                markdown_source, to='html', format='md', outputfile=None, filters=tempfile
            ).strip()

tests.py:354:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:92: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'**Here comes the content.**', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None, filters = ['/tmp/tmp3f9k2vwi.py'], verify_format = True
sandbox = False, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            if not (to in ["odt", "docx", "epub", "epub3", "pdf"] and outputfile == "-"):
                stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Traceback (most recent call last):
E             File "/tmp/tmp3f9k2vwi.py", line 8, in <module>
E               from pandocfilters import toJSONFilter, Str
E           ModuleNotFoundError: No module named 'pandocfilters'
E           Error running filter /tmp/tmp3f9k2vwi.py:
E           Filter returned error status 1

pypandoc/__init__.py:467: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
===================================================================================== warnings summary ======================================================================================
pypandoc/pandoc_download.py:61
  /home/tkloczko/rpmbuild/BUILD/pypandoc-1.13/pypandoc/pandoc_download.py:61: DeprecationWarning: invalid escape sequence '\.'
    regex = re.compile(r"/jgm/pandoc/releases/download/.*(?:"+processor_architecture+"|x86|mac).*\.(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================== short test summary info ==================================================================================
FAILED tests.py::TestPypandoc::test_conversion_with_data_files - RuntimeError: Pandoc died with exitcode "97" during conversion: Could not find data file data/data/docx/[Content_Types].xml
FAILED tests.py::TestPypandoc::test_conversion_with_mixed_filters - RuntimeError: Pandoc died with exitcode "83" during conversion: Traceback (most recent call last):
FAILED tests.py::TestPypandoc::test_conversion_with_python_filter - RuntimeError: Pandoc died with exitcode "83" during conversion: Traceback (most recent call last):
=================================================================== 3 failed, 37 passed, 1 deselected, 1 warning in 4.19s ===================================================================

Answer 27 · 2024-04-12T00:07:02.000Z

@kloczek Can you try the following:

Running from source from the git repo?
running poetry install and make sure it iinstalls pandocfilters
Then run pytest with poetry run python tests.py