kurtmckee/feedparser

opening file mentioned in feed doctype

Opened this issue · 3 comments

I am currently getting an "Unknown IO error" printed to stderr while using feedparser.parse('http://feeds.feedburner.com/news_trailbusterscom?format=xml')
It defines a header:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

I have been using strace to see where it is happening and I saw a stat('http://my.netscape.com/publish/formats/rss-0.91.dtd') call for the doctype from the xml header. I tried to parse a feed with doctype changed to file://etc/hosts and strace have disclosed a successful stat() and open() for the file a filled in the doctype url.

This behaviour seems a little bit suspicious to me. Allowing user input to open a file in the system is not much pretty.
Is this OK?

twm commented

No, this behavior is not okay and actually it is pretty serious. Perhaps feedparser should use defusedxml, which wraps a number of Python XML libraries to prevent this stuff, and has nice explanations of these vulnerabilities:

@johniez thanks for reporting this!

@twm, great suggestion! I'd like feedparser to be far more stable and secure than it is, so this may be a necessary change to protect users! I'll look into it as soon as I can!

Any update? This looks like a serious vulnerability