kurtmckee/feedparser

Crash in feedparser 6.0.10

TheVamp opened this issue · 6 comments

I noticed that the latest released version of feedparser crashes, when a CDATA section contains a C Code snippets. Here is an example on how to reproduce the issue.

  • Install feedparser via python -m pip install feedparser
  • RSS XML Crash example - rss.zip
    • or you could use the original feed https://blog.trailofbits.com/feed/
import feedparser

with open("./rss_code_crash.xml", "r") as f:
    rss_data = f.read()
rss = feedparser.parse(rss_data)
# Or just this:
#rss = feedparser.parse('https://blog.trailofbits.com/feed/')

I tested the same issue on the develop branch, but the crash does not occur their.
Thanks for your support.

This is the minimum reproducible example:

<content:encoded xmlns:content="bogus">
    <![CDATA[
        <!h<!h<!h<
    ]]>
</content:encoded>

The crash is coming from within the Python standard library -- _markupbase.py at line 134 raises an AssertionError stating "unexpected '<' char in declaration".

On a side note, it appears that Trail of Bits is using Wordpress. Perhaps this is a bug that exists in Wordpress or one of the plugins in its ecosystem and could be fixed there, as well!

Is there a specific code change in the develop branch that fixed that problem and interpret the content in a different way?

In the develop branch everything works as expected:

  • python -m pip install git+https://github.com/kurtmckee/feedparser@develop
  • using your RSS sample or my RSS sample as input
  • executing the python script from above and everything works fine

That was why I thought it is a bug in feedparser.
I will have a look into the Wordpress topic.

Yep, I saw the same thing with the develop branch.

The crash is a bug in the feedparser 6.0.10 release. However, that's happening because Wordpress is failing to escape the code in its <pre> blocks. It's two bugs, in different products, not one.

Coincidentally I was about to raise this exact same issue for the same feed. Looking forward to a fix for it