UnicodeDecodeError while parsing feed
samuelclay opened this issue · 2 comments
samuelclay commented
Here's a feed that throws a UnicodeDecodeError (similar to #273 but decoding): http://feed.informer.com/digests/XDOCBDJCK3/feeder.atom. Now it doesn't validate but it should probably still be handled with a bozo exception.
>>> import feedparser
>>> feedparser.__version__
'6.0.2'
>>> feedparser.parse('http://feed.informer.com/digests/XDOCBDJCK3/feeder.atom')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/feedparser/api.py", line 255, in parse
saxparser.parse(source)
File "/usr/local/lib/python3.9/xml/sax/expatreader.py", line 111, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/local/lib/python3.9/xml/sax/xmlreader.py", line 125, in parse
self.feed(buffer)
File "/usr/local/lib/python3.9/xml/sax/expatreader.py", line 217, in feed
self._parser.Parse(data, isFinal)
File "/usr/src/python/Modules/pyexpat.c", line 461, in EndElement
File "/usr/local/lib/python3.9/xml/sax/expatreader.py", line 381, in end_element_ns
self._cont_handler.endElementNS(pair, None)
File "/usr/local/lib/python3.9/site-packages/feedparser/parsers/strict.py", line 124, in endElementNS
self.unknown_endtag(localname)
File "/usr/local/lib/python3.9/site-packages/feedparser/mixin.py", line 320, in unknown_endtag
method()
File "/usr/local/lib/python3.9/site-packages/feedparser/namespaces/mediarss.py", line 58, in _end_media_title
self._end_title()
File "/usr/local/lib/python3.9/site-packages/feedparser/namespaces/_base.py", line 384, in _end_title
value = self.pop_content('title')
File "/usr/local/lib/python3.9/site-packages/feedparser/mixin.py", line 630, in pop_content
value = self.pop(tag)
File "/usr/local/lib/python3.9/site-packages/feedparser/mixin.py", line 508, in pop
output = base64.decodebytes(output.encode('utf8')).decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 1: invalid continuation byte
kurtmckee commented
Thanks Samuel, I'll work to get this fixed and released as a hotfix.
kurtmckee commented
This is fixed in feedparser 6.0.4. Thanks for reporting this, Samuel!