kurtmckee/feedparser

Failed to parse description field with escaped CDATA.

Opened this issue · 0 comments

Bug Description:
Up to the current version (2024-04-12), if the description field contains escaped CDATA, feedparser fails to extract the content. I have simplified the issue and provided a minimal reproducible test case ( source RSS link ).

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>xueqiu</title>
    <link>http://xueqiu.com/hots/topic</link>
    <description>xiuqiu</description>
    <item>
      <title>title</title>
      <link>http://xueqiu.com/1630191122/288006046</link>
      <description>&lt;![CDATA[some text]]&gt;</description>
      <pubDate>Sat, 27 Apr 2024 08:26:02 GMT</pubDate>
      <guid>http://xueqiu.com/1630191122/288006046</guid>
      <dc:creator>name</dc:creator>
      <dc:date>2024-04-27T08:26:02Z</dc:date>
    </item>
  </channel>
</rss>

Expectation:
feed.entries[0].description=='some text', but the actual result is an empty string.
If &lt;![CDATA[some text]]&gt; is changed to <![CDATA[some text]]>, then it works fine.