Truelite/python-a38

manage bytes string in pdf conversion

Closed this issue · 8 comments

Affecting a38 version 0.1.2: when a fattura contains an attachment in base64 form under tag

<ns3:FatturaElettronica xmlns:ns3="http://ivaservizi.agenziaentrate.gov.it/docs/xsd/fatture/v1.2" xmlns:ns2="http://www.w3.org/2000/09/xmldsig#" versione="FPR12">
<FatturaElettronicaBody><Allegati><**Attachment**>

and we try to convert into PDF with wkhtmltopdf the following error is raised:

`ERROR uncaught exception
Traceback (most recent call last):
  File "/usr/local/bin/a38tool", line 350, in <module>
    main()
  File "/usr/local/bin/a38tool", line 343, in main
    res = app.run()
  File "/usr/local/bin/a38tool", line 247, in run
    self.render(f, output)
  File "/usr/local/bin/a38tool", line 288, in render
    self.transform.to_pdf(self.wkhtmltopdf, f, output)
  File "/usr/local/lib/python3.8/dist-packages/a38/render.py", line 32, in to_pdf
    html = self(f)
  File "/usr/local/lib/python3.8/dist-packages/a38/render.py", line 21, in __call__
    tree = f.build_etree(lxml=True)
  File "/usr/local/lib/python3.8/dist-packages/a38/fattura.py", line 685, in build_etree
    self.to_xml(builder)
  File "/usr/local/lib/python3.8/dist-packages/a38/fattura.py", line 671, in to_xml
    field.to_xml(b1, getattr(self, name))
  File "/usr/local/lib/python3.8/dist-packages/a38/fields.py", line 669, in to_xml
    val.to_xml(builder)
  File "/usr/local/lib/python3.8/dist-packages/a38/models.py", line 154, in to_xml
    field.to_xml(b, getattr(self, name))
  File "/usr/local/lib/python3.8/dist-packages/a38/fields.py", line 669, in to_xml
    val.to_xml(builder)
  File "/usr/local/lib/python3.8/dist-packages/a38/models.py", line 154, in to_xml
    field.to_xml(b, getattr(self, name))
  File "/usr/local/lib/python3.8/dist-packages/a38/fields.py", line 115, in to_xml
    builder.add(self.get_xmltag(), self.to_str(value))
  File "/usr/local/lib/python3.8/dist-packages/a38/builder.py", line 27, in add
    self.etreebuilder.end(tag)
  File "src/lxml/saxparser.pxi", line 836, in lxml.etree.TreeBuilder.end
  File "src/lxml/saxparser.pxi", line 771, in lxml.etree.TreeBuilder._handleSaxEnd
  File "src/lxml/saxparser.pxi", line 740, in lxml.etree.TreeBuilder._flush
TypeError: sequence item 0: expected str instance, bytes found`

After some investigation, the string containing the attachment content is a "bytes" type.
Using python library: wkhtmltopdf 0.2

I think that it make no sense try to handle attachment or binary content during the conversion.
what do you think to insert an additional conditional here:

if value is not None:

with: if value is not None and not isinstance(value, (bytes, bytearray)) :

I'm a bit afraid to introduce a workaround so deep in the code, and would like to try to address the situation more cleanly.

I tried with some fatture which have attached PDF files, and managed to convert them to HTML using the usual fatturaordinaria stylesheet. Could you help me reproduce the issue, like sending me a copy of the fattura and the stylesheet?

If you don'd feel like sharing the fattura and the stylesheet publicly here, but you can share them privately, you can send them to me at enrico@enricozini.org, and I promise to delete them after investigating the issue

it's an example coming from Agenzia delle Entrate with the attachment tag
IT01234567890_FPR01.zip
Foglio_di_stile_fatturaordinaria_v1.2.1.xsl.zip

a38tool pdf -f FoglioStileAssoSoftware.xsl -o fattura.pdf IT01234567890_FPR01.xml
ERROR uncaught exception
Traceback (most recent call last):
File "/usr/local/bin/a38tool", line 350, in
main()
File "/usr/local/bin/a38tool", line 343, in main
res = app.run()
File "/usr/local/bin/a38tool", line 247, in run
self.render(f, output)
File "/usr/local/bin/a38tool", line 288, in render
self.transform.to_pdf(self.wkhtmltopdf, f, output)
File "/usr/local/lib/python3.8/dist-packages/a38/render.py", line 32, in to_pdf
html = self(f)
File "/usr/local/lib/python3.8/dist-packages/a38/render.py", line 21, in call
tree = f.build_etree(lxml=True)
File "/usr/local/lib/python3.8/dist-packages/a38/fattura.py", line 685, in build_etree
self.to_xml(builder)
File "/usr/local/lib/python3.8/dist-packages/a38/fattura.py", line 671, in to_xml
field.to_xml(b1, getattr(self, name))
File "/usr/local/lib/python3.8/dist-packages/a38/fields.py", line 669, in to_xml
val.to_xml(builder)
File "/usr/local/lib/python3.8/dist-packages/a38/models.py", line 154, in to_xml
field.to_xml(b, getattr(self, name))
File "/usr/local/lib/python3.8/dist-packages/a38/fields.py", line 669, in to_xml
val.to_xml(builder)
File "/usr/local/lib/python3.8/dist-packages/a38/models.py", line 154, in to_xml
field.to_xml(b, getattr(self, name))
File "/usr/local/lib/python3.8/dist-packages/a38/fields.py", line 115, in to_xml
builder.add(self.get_xmltag(), self.to_str(value))
File "/usr/local/lib/python3.8/dist-packages/a38/builder.py", line 27, in add
self.etreebuilder.end(tag)
File "src/lxml/saxparser.pxi", line 836, in lxml.etree.TreeBuilder.end
File "src/lxml/saxparser.pxi", line 771, in lxml.etree.TreeBuilder._handleSaxEnd
File "src/lxml/saxparser.pxi", line 740, in lxml.etree.TreeBuilder._flush
TypeError: sequence item 0: expected str instance, bytes found

Thanks! I tried the same with the code currently in master, and I didn't get the crash. Are you running the code from commit 904faf4 ?

If that is the case, let's compare versions of other things in the stack. My python's 3.7.3, and my wkhtmltopdf is 0.12.5. Was it a typo when you mentioned yours is 0.2? Note that a38 doesn't need a python library for wkhtmltopdf, and just calls the tool's command line.

I'm using v0.1.2 at commit 0c80c4e

Python 3.8.5
ii wkhtmltopdf 0.12.5-1build1 amd64 Command line utilities to convert html to pdf or image using WebKit

Ok, I still can't reproduce it (slightly different environment: Debian Sid with Python 3.9.1, wkhtmltopdf 0.12.6-1).

I however could reproduce it with a39 at commit 0c80c4e: looking master also has the more recent commit 904faf4 which indeed fixes this issue.

Could you try with the version in master?

Hi @spanezz fixed for me with release 0.1.3
Thank you!