JessicaTegner/pypandoc

Sorting the order of multiple globbed files

Opened this issue · 7 comments

PR #248 supports the ability to pass multiple files, but I notice that if I pass a glob pattern (eg *.html) to combine multiple files into a single PDF:

  • in the command line version of pandoc, the glob returned docs are parsed in alphabetical sort order;
  • in pypandoc, the glob returned files are parsed in an arbitrary order.

If a glob pattern is passed, eg to capture files 01.md, 02.md etc, it would be really useful if the sort order were respected.

@JessicaTegner I have added a fix for this in my PR. Can I be assigned to this. Let me know if I have to edit my code, thanks.

Hello @JessicaTegner ,

I saw that this request was merge here: #292

Upgrading from 1.9 to 1.10 caused a regression in my application.
As we can pass a list of files, with this change, it's now impossible to pass a list of files with my preferred order.
If I have files named book.md, addenda01.md, addenda02.md and I want to pass them keeping the order, it will not work.

My suggestion is that the sorting should be done on user side. The change should be revert then.
If it's not possible to revert the change, I suggest that another argument sort_input_files=False is added (I don't think it's a good idea to add an argument just for that but I see no other possibility).

cc @psychemedia

IIRC, my original issye was inconsistency between default pandoc and pypandoc behaviours.

I think the multiple

I must be left to pandoc instead of doing it in this library

Pandoc accepts this command: pandoc _posts/*.md --from markdown -o books/cookbook.epub

When we use pypandoc as this: pypandoc.convert_file('_posts/*.md', 'epub', outputfile="books/test.epub")

instead of listing individually files before passing them to Pandoc as done here: https://github.com/JessicaTegner/pypandoc/blob/master/pypandoc/__init__.py#L159-L168
This glob (_posts/*.md) must be passed as it is to pandoc.

This will avoid inconsistency between pypandoc and pandoc and reduce the processing needed in this package. WDYT @JessicaTegner

@fsoedjede that seems okay on the surface, but I'm not sure if that would mean that we would run in to some errors with automatics detection of the file type (since I'm pretty sure we do that at some point)

Also @fsoedjede I just tried your command and I get the following output

pandoc.exe: tests/*.md: withBinaryFile: invalid argument (Invalid argument)                                             

Hi @JessicaTegner! I'm trying to track down the same withBinaryFile: invalid argument (Invalid argument) error I'm seeing with pandoc on Windows... any idea what could cause it? (There aren't many hits after searching for a while; in fact I think this error is the first I've seen.) Appreciate any insight you might have. Thanks!

Admittedly, I'm kinda doing something weird, but maybe these details help (?)... I'm running something like pandoc file.md -o file.docx on Windows with the latest pandoc version 3.1.9 and it works flawlessly in either PowerShell or Git Bash... but if I run from within a Rust utility in Git Bash via std::process::Command::new("sh").args(["-c", "pandoc file.md -o file.docx"]).spawn().unwrap().wait().unwrap() suddenly it doesn't work... but other commands via this method work fine. I have also tried several other Pandoc versions and so far they're all consistently working / not working in the same way.

It'd be nice if searching pandoc source code repo and/or issues for this error produced anything... ;-)