ZUGFeRD/ZUV

unusable release 0.8.2?

mgoppold opened this issue · 4 comments

the release 0.8.2 generates error 'Invalid byte 1 of 1-byte UTF-8 sequence.`

java -jar /usr/share/java/ZUV-0.8.2.jar --action validate -f example.pdf 
Oct 16, 2019 8:15:25 PM org.mustangproject.ZUGFeRD.ZUGFeRDImporter <init>
SEVERE: null
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
	at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
	at org.mustangproject.ZUGFeRD.ZUGFeRDImporter.setDocument(ZUGFeRDImporter.java:170)
	at org.mustangproject.ZUGFeRD.ZUGFeRDImporter.setRawXML(ZUGFeRDImporter.java:176)
	at org.mustangproject.ZUGFeRD.ZUGFeRDImporter.extractFiles(ZUGFeRDImporter.java:150)
	at org.mustangproject.ZUGFeRD.ZUGFeRDImporter.extractLowLevel(ZUGFeRDImporter.java:124)
	at org.mustangproject.ZUGFeRD.ZUGFeRDImporter.<init>(ZUGFeRDImporter.java:77)
	at ZUV.PDFValidator.validate(PDFValidator.java:143)
	at ZUV.ZUGFeRDValidator.validate(ZUGFeRDValidator.java:84)
	at ZUV.Main.run(Main.java:81)
	at ZUV.Main.main(Main.java:99)

Exception in thread "main" org.mustangproject.ZUGFeRD.ZUGFeRDExportException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
	at org.mustangproject.ZUGFeRD.ZUGFeRDImporter.<init>(ZUGFeRDImporter.java:80)
	at ZUV.PDFValidator.validate(PDFValidator.java:143)
	at ZUV.ZUGFeRDValidator.validate(ZUGFeRDValidator.java:84)
	at ZUV.Main.run(Main.java:81)
	at ZUV.Main.main(Main.java:99)
Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
	at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
	at org.mustangproject.ZUGFeRD.ZUGFeRDImporter.setDocument(ZUGFeRDImporter.java:170)
	at org.mustangproject.ZUGFeRD.ZUGFeRDImporter.setRawXML(ZUGFeRDImporter.java:176)
	at org.mustangproject.ZUGFeRD.ZUGFeRDImporter.extractFiles(ZUGFeRDImporter.java:150)
	at org.mustangproject.ZUGFeRD.ZUGFeRDImporter.extractLowLevel(ZUGFeRDImporter.java:124)
	at org.mustangproject.ZUGFeRD.ZUGFeRDImporter.<init>(ZUGFeRDImporter.java:77)
	... 4 more

This happens with the downloaded jar as well as the self compiled jars (openjdk and oracle jdk).

java -version
java version "1.8.0_231"
Java(TM) SE Runtime Environment (build 1.8.0_231-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.231-b11, mixed mode)

no change with 0.8.3

swsch commented

What's the origin of your example.pdf? Can you attach it to this issue?

I tracked down the error. This happens because of a missing encoding declaration in a non utf-8 encoded XML file with german umlauts.

ZUV-0.8.0.jar does not mind this:

java -jar ~/Downloads/ZUV-0.8.0.jar --action validate -x ZUGFeRD-invoice_lat1_without_encoding_declaration.xml
<validation>
<xml>
<info><version>1</version><profile>urn:ferd:CrossIndustryDocument:invoice:1p0:basic</profile><validator version="0.8.0"></validator><validation datetime="2019-10-20 21:03:24"><rules><fired>29</fired><failed>0</failed></rules><duration unit='ms'>5560</duration></validation></info><summary status='valid'/>
</xml>
</validation>
java -jar ~/Downloads/ZUV-0.8.2.jar --action validate -f ZUGFeRD-invoice_lat1_without_encoding_declaration.xml
<validation><messages><exception type="8">File does not look like PDF nor XML</exception>
</messages><summary status='invalid'/></validation>

With ZUGFeRD-invoice_lat1_without_encoding_declaration.xml embedded in a pdf the initial message occurs.

With latin1 encoding declaration:

<?xml version="1.0" encoding="latin1"?>
...

the error no longer occurs.

Technically, the original error message is correct, it would be nice if it were a little more detailed, Invalid byte 1 of 1-byte UTF-8 sequence <n1:Name> in ZUGFeRD-invoice.xml for example.

ZUGFeRD-invoice_lat1_without_encoding_declaration.xml.txt
ZUGFeRD-invoice_lat1.xml.txt

feel free to fix this, currently no plans to integrate a utf8 validator :-/