jaxl/JAXL

XML parser error in PHP7

cburschka opened this issue · 2 comments

I traced a mysterious hang-up during authentication to the following problem:

  1. The client successfully negotiates TLS.
  2. On reading <proceed xmlns='urn:ietf:params:xml:ns:xmpp-tls'/>, the client resets the parser.
  3. The next string that is supposed to hit the newly initiated parser is <?xml version='1.0'?><stream:stream xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams' id='...' from='...' version='1.0' xml:lang='en'>
  4. Even though nothing else should have been parsed yet (and xml_get_current_byte_index returns 0), the parser will report Reserved XML Name. That error is usually a result of <?xml version='1.0'?> being preceded by any other input. That implies that something is "polluting" the parser in between being reset and receiving that string.

Unfortunately I didn't find out what happens, but I was able to use the following hacky workaround to make sure that a <? string automatically resets the parser again:

diff --git a/core/jaxl_xml_stream.php b/core/jaxl_xml_stream.php
index 1c2a70d..42f2f88 100644
--- a/core/jaxl_xml_stream.php
+++ b/core/jaxl_xml_stream.php
@@ -89,6 +89,7 @@ class JAXLXmlStream {
        }

        public function parse($str) {
+               if (strlen($str) > 2 && $str[0] == '<' && $str[1] == '?') $this->reset_parser();
                xml_parse($this->parser, $str, false);
        }

May that "other input" be BOM mark? For UTF-8 it's unnessary but happens. And can give strange results given you can't see it in viewer but parser shure finds them...

I haven't examined it deeply enough to count that out completely. But I don't think any bytes could hide in front of the <?xml string itself, since as far as I know the $str[0] == '<' operates on a byte level, so the fix wouldn't work.

I suppose it'd have to be in a separate call to xml_parse(), which I haven't been able to find.