ballsteve/xrust

Canonical XML

Opened this issue · 3 comments

Many tests specify an XML result, but there are legitimate variations in XML that make an exact string comparison difficult. For example, quoting attribute values with single or double quotes, empty element syntax, etc.

We need a facility to be able to compare XML documents in a canonical fashion, i.e. disregarding allowable syntax differences.

+1

Was looking into this, one of the things that would cause breaks is comments. The XML parser cannot ignore them as XSLT can match on a comment, but for the purposes of Canonical XML comparisons they can be ignored.

Thinking the way to go might be have a function/method that scans a whole tree and generates a canonical version from that?

From a data point-of-view, comments cannot be ignored. The XML parser must handle them, and the document/tree structure must allow for them. The XSLT processor ignores them in the stylesheet document, but not in source or result documents.

A function/method that produces a canonical view of an XML document could have an option to ignore comments, processing instructions, and ignorable whitespace (however that might be defined).