bug in WXR_Parser_Regex when parsing authors
pbiron opened this issue · 0 comments
pbiron commented
The regex parser assumes that author info is all contained on a single line, when in practice the WP exporter outputs authors across multiple lines in the WXR.
For example, the exporter outputs
<wp:author>
<wp:author_id>7</wp:author_id>
<wp:author_login>username</wp:author_login>
<wp:author_email>user@example.com</wp:author_email>
<wp:author_display_name><![CDATA[First Last]]></wp:author_display_name>
<wp:author_first_name><![CDATA[First]]></wp:author_first_name>
<wp:author_last_name><![CDATA[Last]]></wp:author_last_name>
</wp:author>
whereas, the regex parser is expecting
<wp:author><wp:author_id>7</wp:author_id><wp:author_login>username</wp:author_login><wp:author_email>user@example.com</wp:author_email><wp:author_display_name><![CDATA[First Last]]></wp:author_display_name><wp:author_first_name><![CDATA[First]]></wp:author_first_name><wp:author_last_name><![CDATA[Last]]></wp:author_last_name></wp:author>
I've got a tentative fix, but need to test it some more before submitting a PR (which probably won't be until the weekend)