docx4j is an open source (Apache v2) library for creating, editing, and saving OpenXML "packages", including docx, pptx, and xslx.
It uses JAXB to create the Java representation.
- Open existing docx/pptx/xlsx (from filesystem, SMB/CIFS, WebDAV using VFS)
- Create new docx
- Programmatically manipulate the docx document (of course)
- CustomXML binding (with OpenDoPE extensions for repeats & conditionals)
- Export as HTML or PDF
- Diff/compare documents, paragraphs or sdt (content controls)
- Import a binary doc (uses Apache POI's HWPF)
- Produce/consume Word 2007's xmlPackage (pkg) format
- Save docx to filesystem as a docx (ie zipped), or to JCR (unzipped)
- Apply transforms, including common filters
- Font support (font substitution, and use of any fonts embedded in the document)
http://www.docx4java.org/downloads.html
See the Getting Started guide.
http://www.docx4java.org/forums or StackOverflow (use tag 'docx4j')
Get it from GitHub, at https://github.com/plutext/docx4j For more details, see http://www.docx4java.org/blog/2012/05/docx4j-from-github-in-eclipse/
docx4j is published under the Apache License version 2.0. For the license text, please see the following files in the legals directory:
-
LICENSE
-
NOTICE Legal information on libraries used by docx4j can be found in the "legals/NOTICE" file. Here is a (TODO: non exhaustive?) list of files included in docx4j but not published under Apache License version 2.0:
-
DocX2Html.xslt (though docx2xhtmlNG2.xslt is our supported transform, not that)
-
src/diffx (ARTISTIC LICENCE)
-
xsd/**