kurtmckee/feedparser

UnicodeDecodeError while parsing feed

samuelclay opened this issue · 2 comments

Here's a feed that throws a UnicodeDecodeError (similar to #273 but decoding): http://feed.informer.com/digests/XDOCBDJCK3/feeder.atom. Now it doesn't validate but it should probably still be handled with a bozo exception.

>>> import feedparser
>>> feedparser.__version__
'6.0.2'
>>> feedparser.parse('http://feed.informer.com/digests/XDOCBDJCK3/feeder.atom')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/feedparser/api.py", line 255, in parse
    saxparser.parse(source)
  File "/usr/local/lib/python3.9/xml/sax/expatreader.py", line 111, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/local/lib/python3.9/xml/sax/xmlreader.py", line 125, in parse
    self.feed(buffer)
  File "/usr/local/lib/python3.9/xml/sax/expatreader.py", line 217, in feed
    self._parser.Parse(data, isFinal)
  File "/usr/src/python/Modules/pyexpat.c", line 461, in EndElement
  File "/usr/local/lib/python3.9/xml/sax/expatreader.py", line 381, in end_element_ns
    self._cont_handler.endElementNS(pair, None)
  File "/usr/local/lib/python3.9/site-packages/feedparser/parsers/strict.py", line 124, in endElementNS
    self.unknown_endtag(localname)
  File "/usr/local/lib/python3.9/site-packages/feedparser/mixin.py", line 320, in unknown_endtag
    method()
  File "/usr/local/lib/python3.9/site-packages/feedparser/namespaces/mediarss.py", line 58, in _end_media_title
    self._end_title()
  File "/usr/local/lib/python3.9/site-packages/feedparser/namespaces/_base.py", line 384, in _end_title
    value = self.pop_content('title')
  File "/usr/local/lib/python3.9/site-packages/feedparser/mixin.py", line 630, in pop_content
    value = self.pop(tag)
  File "/usr/local/lib/python3.9/site-packages/feedparser/mixin.py", line 508, in pop
    output = base64.decodebytes(output.encode('utf8')).decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 1: invalid continuation byte

Thanks Samuel, I'll work to get this fixed and released as a hotfix.

This is fixed in feedparser 6.0.4. Thanks for reporting this, Samuel!