JessicaTegner/pypandoc

arabic encoding support and image in header

i-salameh95 opened this issue · 5 comments

Hello
I'm using this package to convert .docx file to .pdf file
my file contain variables filled by Django view.
all of this working correctly,
but the .docx file have Arabic statement, so the program show error during conversion, so I remove this statement and then the code works like a charm.
so how to support convert doc with Arabic words?
also the .docx file has header ( with logo ), the outputted pdf has no header at all !!

@author_required
def download_approval(request, project_id):
    project = get_object_or_404(Project, pk=project_id)
    doc = DocxTemplate('letter.docx')
    context = {
        'ref_num': project.ref_num,
        'author_name': project.author.get_full_name,
        'approval_date': project.approved_date.date(),
        'project_title': project.title_en
    }
    doc.render(context)
    doc.save('approval_letter.docx')
    pypandoc.convert_file('approval_letter.docx', 'latex', outputfile="research_permission_request.pdf",
                          extra_args=['--pdf-engine=C:\\Users\\HP\\Desktop\\mktex\\miktex\\bin\\x64\\pdflatex.exe'])
    pdf = open('research_permission_request.pdf', 'rb')
    response = FileResponse(pdf)
    return response

hi @i-salameh95

Thanks for the issue, sorry it's taken a little.

First of all, can you indicate which python, pypandoc and pandoc version you are using, as that can effect the image/logo not showing up?

For Arabic support, you might want to look at jgm/pandoc#5643

Hi @JessicaTegner
this is from the requirements.txt file. ( django packages)

pypandoc    v 1.11
pypandoc-binary        v 1.11
python v 3.9.2 
django v 4.2

(venv) > pandoc -v
pandoc 3.1.2
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: C:....\AppData\Roaming\pandoc
Copyright (C) 2006-2023 John MacFarlane. Web:  https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.

also I have attach the docx that I wanna covert to pdf :
letter.docx
also this is the pdf that resulted from the conversion process:
research_permission_request.pdf

note1: I don't wanna support Arabic anymore, I have translate it to English.
the only thing I wanna solve is the the image..
note2: I'm running this code on windows machine.

also, does the previous code works on Nginx, Linux (ubuntu) server ?

So correct me if I'm wrong, but it seems like the images in the word documents are in the page header, which from my understanding of pandoc do not get converted as well.
I did some testing with pure pandoc and the files you provided, and they indeed od not get converted over, which is really strange.

so ? how to convert the docx with the images :/ is there any workaround ?