JDOM does not output an internal DTD declaration it parsed
Closed this issue · 2 comments
I have a legacy project that I am writing a utility for, to move elements from one xml file to another. It must move one element at a time. It all works very well except that one of the files I have to process uses internal DTD declarations. JDOM successfully parses these very nicely. But when my program writes back the remaining XML after removing the element from the old file or adding the element to the new file, the DTD declaration is lost in the output file. This means my output file is no good for input the next go-round.
I found as issue about this from 2001(!) but it appears nothing was done about it. I suppose I shouldn't be surprised as that was probably about the time when DTDs were being replaced by Schemas, but I am wondering if anyone has a workaround for this problem, or if I am missing something in the API.
The workaround I found was switching to external DTDs. My program can do this DocType.setSystemID().
Note that when JDOM parses the input XML it uses a third-party parser (typically xerces) which resolves the DTD entity references before giving them to JDOM. Thus, when JDOM outputs the XML, it has no entity references (because they have already been resolved). Now, while it may make sense that the DTD declaration is copied over to JDOM, the xerces parser will not trigger the DTD events unless the xerces parser is told to not resolve entities. Thus, you can't have the parser resolve entities and give you the full DTD declaration at the same time.
Consider this test code:
public static void main(String[] args) throws JDOMException, IOException {
SAXBuilder sb = new SAXBuilder();
sb.setExpandEntities(false);
Document doc = sb.build("complex.xml");
XMLOutputter xout = new XMLOutputter();
xout.output(doc, System.out);
}
The above will print the DOCTYPE declaration, but changing the setExpandEntities
to true (the default) will not.
This is an issue/idiosyncrasy in the underlying xerces parser, and not in JDOM itself.
Switching to external DTDs is a good workaround for this problem.