Notations declared in external DTD subsets are reported as undefined
nektarios-kitsios opened this issue · 6 comments
Validation fails with the error below when a NOTATION that is declared in an external DTD subset is referenced in an internal DTD subset:
om.ctc.wstx.exc.WstxValidationException:
1 referenced notation undefined: first one 'IMAGE'
at [row,col {unknown-source}]: [1,109]
at com.ctc.wstx.exc.WstxValidationException.create(WstxValidationException.java:50)
at com.ctc.wstx.sr.StreamScanner.reportValidationProblem(StreamScanner.java:593)
at com.ctc.wstx.sr.StreamScanner.reportValidationProblem(StreamScanner.java:601)
at com.ctc.wstx.dtd.FullDTDReader._reportVCViolation(FullDTDReader.java:1989)
at com.ctc.wstx.dtd.FullDTDReader._reportUndefinedNotationRefs(FullDTDReader.java:1967)
at com.ctc.wstx.dtd.FullDTDReader.parseDTD(FullDTDReader.java:639)
at com.ctc.wstx.dtd.FullDTDReader.readInternalSubset(FullDTDReader.java:427)
at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:277)
at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3466)
at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2089)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1180)
at org.codehaus.stax.test.vstream.TestExternalSubset.testNotationReferenceInInternalSubset(TestExternalSubset.java:77)
The following example demonstrates this problem:
XML:
<!DOCTYPE root SYSTEM 'test.dtd' [ <!ENTITY gr2 SYSTEM "gr2" NDATA IMAGE> ]>
<root>&extEnt;</root>
DTD:
<!ELEMENT root (#PCDATA)>
<!ENTITY extEnt 'just testing'>
<!NOTATION IMAGE PUBLIC "-//AS//NOTATION image format//EN"
"http://www.test.com/xml/common/dtd/notation/image">
Note that Xerces and xmllint validate the above XML successfully. This is in accordance with the XML specification which does not impose any specific order in the declarations. The only requirement is the following:
If both the external and internal subsets are used, the internal subset must be considered to occur before the external subset. This has the effect that entity and attribute-list declarations in the internal subset take precedence over those in the external subset.
The problem seems to be that woodstox reads the internal subset before the external one, and reports the error at the time the internal subset is read when the notation declaration has not been read yet.
For the record there used to be an old issue reporting this problem here: https://web.archive.org/web/20150507153747/http://jira.codehaus.org/browse/WSTX-264
Attached is a patch with a unit test that demonstrates this case and a potential fix: Issue_184_-_potential_fix_and_unit_test.patch.txt
Thank you for reporting this issue. Interesting that this was not covered by the extensive XMLTest suite (I think maybe https://www.w3.org/XML/Test/ but it has been years) that had ~2000 tests, many focusing on DTD aspects.
I hope to find time to at least get test integrated and then have a look at proposed solution.
Attached is a new version of the patch that fixes a problem with parameter entity overrides (demonstrated in a unit test). To solve this the external subset is read twice, but there may be better ways to fix this.
@nektarios-kitsios Apologies for slow follow up here. Would it be possible to split patch in two parts: one for unit tests, another for actual fix? I should be able to merge tests first (and move under failing
until fixed), and could see how fix itself would work.
As requested, split the patch in two parts:
Issue_184_-_unit_tests.patch.txt
Issue_184_-_potential_fix.patch.txt
Thank you!