remy/wm

Fails to detect links in escaped HTML content in a valid Atom 1.0 feed

MaybeThisIsRu opened this issue · 1 comments

Error with https://rusingh.com/notes.xml.

When content is escaped, which is correct per the Atom specification (see bullet point number 2 in section 4.1.3.3):

<content type="html">&lt;p&gt;This is a #POSSE note from my #IndieWeb website.&lt;/p&gt;
&lt;a href=&quot;https://brid.gy/publish/twitter&quot;&gt;&lt;/a&gt;</content>

wm does not parse any links.

This is recognized as a valid Atom 1.0 feed by https://validator.w3.org/feed/check.cgi.

When I do not escape the HTML entities:

<content type="html"><p>This is a #POSSE note from my #IndieWeb website.</p>
<a href="https://brid.gy/publish/twitter"></a></content>

wm is able to detect links and send webmentions.
This is recognized as an invalid Atom 1.0 feed by https://validator.w3.org/feed/check.cgi.

This seems undesirable. Where feed is valid, wm doesn't work, but the opposite seems to work. I've been having a go at this but not sure what's wrong.

wm/lib/rss/dom.js

Lines 6 to 10 in e3d0415

const rss = await new Parser({
customFields: {
item: ['summary'],
},
}).parseString(xml);

Up until this point, the XML content has the escaped HTML entities.

After being parsed by rss-parser, it is converted back to unescaped HTML entities. Seems like this is where the opportunity for a fix/improvement would be.

I have also attempted the following with no luck:

<content type="html"><![CDATA[<p>This is a #POSSE note from my #IndieWeb website.</p>
<a href="https://brid.gy/publish/twitter"></a>]]></content>