PDF converter from docx : words are overriding (examples attached)
Opened this issue · 2 comments
ralborodo-RatedPower commented
When converting a docx file (testDocument.docx) to PDF, the output file (testDocument-new.pdf
) has some overriding words.
In order to replicate the issue, here you have the code:
@Test
void simpletestconversion() {
try(InputStream in = new FileInputStream(docPath);
OutputStream out = new FileOutputStream(pdfPath)) {
XWPFDocument document = new XWPFDocument(in);
PdfOptions pdfOptions = PdfOptions.create();
// Use a special font provider for chinese
pdfOptions.fontProvider(CHINESE_FONT_PROVIDER);
PdfConverter.getInstance().convert(document, out, pdfOptions);
} catch(Exception e) {
e.printStackTrace();
}
}
with Chinese font provider defined as follow
private static final IFontProvider CHINESE_FONT_PROVIDER = (familyName, encoding, size, style, color) -> {
try {
BaseFont bf = BaseFont.createFont("/fonts/NotoSansCJK-Regular.ttc" + ",0", BaseFont.IDENTITY_H,
BaseFont.NOT_EMBEDDED);
Font font = new Font(bf, size, style, color);
if(familyName != null) {
font.setFamily(familyName);
}
return font;
} catch(DocumentException | IOException e) {
log.error("Font error", e);
return ITextFontRegistry.getRegistry().getFont(familyName, encoding, size, style, color);
}
};
and using the following dependencies in the pom file
...
<apache-poi.version>5.2.3</apache-poi.version>
...
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>5.2.3</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-full</artifactId>
<version>5.2.3</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
<version>2.0.4</version>
</dependency>
Thank you so much for this amazing tool BTW :). I haven't found any related issues.
iu159 commented
facing the same issue, solved by word save as XML. then edit the XML
<w:tblStyle w:val="1" />
mainly change table style value.
then save XML back to docx.
even this, the table is not perfect.
consider use Aspose.
vauns commented
use fr.opensagres.xdocreport:fr.opensagres.xdocreport.converter.docx.docx4j instead
but why i want to know and how to fix it
mark