opensagres/xdocreport

PDF converter from docx : words are overriding (examples attached)

Opened this issue · 2 comments

When converting a docx file (testDocument.docx) to PDF, the output file (testDocument-new.pdf
) has some overriding words.

In order to replicate the issue, here you have the code:

@Test
void simpletestconversion() {
	try(InputStream in = new FileInputStream(docPath);
		OutputStream out = new FileOutputStream(pdfPath)) {

		XWPFDocument document = new XWPFDocument(in);
		PdfOptions pdfOptions = PdfOptions.create();
		// Use a special font provider for chinese
		pdfOptions.fontProvider(CHINESE_FONT_PROVIDER);

		PdfConverter.getInstance().convert(document, out, pdfOptions);
	} catch(Exception e) {
		e.printStackTrace();
	}
}

with Chinese font provider defined as follow

private static final IFontProvider CHINESE_FONT_PROVIDER = (familyName, encoding, size, style, color) -> {
	try {
		BaseFont bf = BaseFont.createFont("/fonts/NotoSansCJK-Regular.ttc" + ",0", BaseFont.IDENTITY_H,
										  BaseFont.NOT_EMBEDDED);
		Font font = new Font(bf, size, style, color);
		if(familyName != null) {
			font.setFamily(familyName);
		}
		return font;
	} catch(DocumentException | IOException e) {
		log.error("Font error", e);
		return ITextFontRegistry.getRegistry().getFont(familyName, encoding, size, style, color);
	}
};

and using the following dependencies in the pom file

...
 <apache-poi.version>5.2.3</apache-poi.version>
...
       <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi</artifactId>
            <version>5.2.3</version>
        </dependency>

        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml-full</artifactId>
            <version>5.2.3</version>
        </dependency>

        <dependency>
            <groupId>fr.opensagres.xdocreport</groupId>
            <artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
            <version>2.0.4</version>
        </dependency>

Thank you so much for this amazing tool BTW :). I haven't found any related issues.

iu159 commented

facing the same issue, solved by word save as XML. then edit the XML
<w:tblStyle w:val="1" />
mainly change table style value.

then save XML back to docx.

even this, the table is not perfect.
consider use Aspose.

vauns commented

use fr.opensagres.xdocreport:fr.opensagres.xdocreport.converter.docx.docx4j instead
but why i want to know and how to fix it
mark