kurtmckee/feedparser

test_001741 (tests/wellformed/sanitize/xml_declaration_unexpected_character.xml) fails with more recent Pythons (>= 3.10)

mcepl opened this issue · 2 comments

mcepl commented

When running the test suite while packaging feedparser for openSUSE/Tumbleweed (with regards to #279 isn’t it the time to vendor sgmllib into this package and maintain it yourself?) with the patched sgmllib I get this error:

[   11s] ======================================================================
[   11s] FAIL: test_001741 (__main__.TestStrictParser)
[   11s] ./tests/wellformed/sanitize/xml_declaration_unexpected_character.xml: xml declaration unexpected character
[   11s] ----------------------------------------------------------------------
[   11s] Traceback (most recent call last):
[   11s]   File "/home/abuild/rpmbuild/BUILD/feedparser-6.0.8/tests/runtests.py", line 912, in fn
[   11s]     self.fail_unless_eval(xmlfile, eval_string)
[   11s]   File "/home/abuild/rpmbuild/BUILD/feedparser-6.0.8/tests/runtests.py", line 173, in fail_unless_eval
[   11s]     env = feedparser.parse(xmlfile)
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/api.py", line 263, in parse
[   11s]     saxparser.parse(source)
[   11s]   File "/usr/lib64/python3.10/xml/sax/expatreader.py", line 111, in parse
[   11s]     xmlreader.IncrementalParser.parse(self, source)
[   11s]   File "/usr/lib64/python3.10/xml/sax/xmlreader.py", line 125, in parse
[   11s]     self.feed(buffer)
[   11s]   File "/usr/lib64/python3.10/xml/sax/expatreader.py", line 217, in feed
[   11s]     self._parser.Parse(data, isFinal)
[   11s]   File "/home/abuild/rpmbuild/BUILD/Python-3.10.2/Modules/pyexpat.c", line 468, in EndElement
[   11s]   File "/usr/lib64/python3.10/xml/sax/expatreader.py", line 381, in end_element_ns
[   11s]     self._cont_handler.endElementNS(pair, None)
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/parsers/strict.py", line 124, in endElementNS
[   11s]     self.unknown_endtag(localname)
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/mixin.py", line 320, in unknown_endtag
[   11s]     method()
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/namespaces/_base.py", line 384, in _end_title
[   11s]     value = self.pop_content('title')
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/mixin.py", line 628, in pop_content
[   11s]     value = self.pop(tag)
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/mixin.py", line 542, in pop
[   11s]     output = resolve_relative_uris(output, self.baseuri, self.encoding, self.contentparams.get('type', 'text/html'))
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/urls.py", line 154, in resolve_relative_uris
[   11s]     p.feed(html_source)
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/html.py", line 156, in feed
[   11s]     super(_BaseHTMLProcessor, self).feed(data)
[   11s]   File "/usr/lib/python3.10/site-packages/sgmllib.py", line 98, in feed
[   11s]     self.goahead(0)
[   11s]   File "/usr/lib/python3.10/site-packages/sgmllib.py", line 168, in goahead
[   11s]     k = self.parse_declaration(i)
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/html.py", line 351, in parse_declaration
[   11s]     return sgmllib.SGMLParser.parse_declaration(self, i)
[   11s]   File "/usr/lib64/python3.10/_markupbase.py", line 134, in parse_declaration
[   11s]     raise AssertionError("unexpected %r char in declaration" % rawdata[j])
[   11s] AssertionError: unexpected '~' char in declaration
[   11s]
[   11s] ======================================================================
[   11s] FAIL: test_001741 (__main__.TestLooseParser)
[   11s] ./tests/wellformed/sanitize/xml_declaration_unexpected_character.xml: xml declaration unexpected character
[   11s] ----------------------------------------------------------------------
[   11s] Traceback (most recent call last):
[   11s]   File "/usr/lib/python3.10/site-packages/sgmllib.py", line 352, in finish_endtag
[   11s]     method = getattr(self, 'end_' + tag)
[   11s] AttributeError: 'LooseFeedParser' object has no attribute 'end_title'
[   11s]
[   11s] During handling of the above exception, another exception occurred:
[   11s]
[   11s] Traceback (most recent call last):
[   11s]   File "/home/abuild/rpmbuild/BUILD/feedparser-6.0.8/tests/runtests.py", line 912, in fn
[   11s]     self.fail_unless_eval(xmlfile, eval_string)
[   11s]   File "/home/abuild/rpmbuild/BUILD/feedparser-6.0.8/tests/runtests.py", line 173, in fail_unless_eval
[   11s]     env = feedparser.parse(xmlfile)
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/api.py", line 272, in parse
[   11s]     feedparser.feed(data.decode('utf-8', 'replace'))
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/html.py", line 156, in feed
[   11s]     super(_BaseHTMLProcessor, self).feed(data)
[   11s]   File "/usr/lib/python3.10/site-packages/sgmllib.py", line 98, in feed
[   11s]     self.goahead(0)
[   11s]   File "/usr/lib/python3.10/site-packages/sgmllib.py", line 137, in goahead
[   11s]     k = self.parse_endtag(i)
[   11s]   File "/usr/lib/python3.10/site-packages/sgmllib.py", line 314, in parse_endtag
[   11s]     self.finish_endtag(tag)
[   11s]   File "/usr/lib/python3.10/site-packages/sgmllib.py", line 354, in finish_endtag
[   11s]     self.unknown_endtag(tag)
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/mixin.py", line 320, in unknown_endtag
[   11s]     method()
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/namespaces/_base.py", line 384, in _end_title
[   11s]     value = self.pop_content('title')
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/mixin.py", line 628, in pop_content
[   11s]     value = self.pop(tag)
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/mixin.py", line 542, in pop
[   11s]     output = resolve_relative_uris(output, self.baseuri, self.encoding, self.contentparams.get('type', 'text/html'))
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/urls.py", line 154, in resolve_relative_uris
[   11s]     p.feed(html_source)
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/html.py", line 156, in feed
[   11s]     super(_BaseHTMLProcessor, self).feed(data)
[   11s]   File "/usr/lib/python3.10/site-packages/sgmllib.py", line 98, in feed
[   11s]     self.goahead(0)
[   11s]   File "/usr/lib/python3.10/site-packages/sgmllib.py", line 168, in goahead
[   11s]     k = self.parse_declaration(i)
[   11s]   File "/home/abuild/rpmbuild/BUILDROOT/python-feedparser-6.0.8-0.x86_64/usr/lib/python3.10/site-packages/feedparser/html.py", line 351, in parse_declaration
[   11s]     return sgmllib.SGMLParser.parse_declaration(self, i)
[   11s]   File "/usr/lib64/python3.10/_markupbase.py", line 134, in parse_declaration
[   11s]     raise AssertionError("unexpected %r char in declaration" % rawdata[j])
[   11s] AssertionError: unexpected '~' char in declaration
[   11s]

Complete build log with all versions of packages used and steps taken to reproduce the issue.

This is fixed by c55bd8a.

mcepl commented

Thank you, yes, c55bd8a fixes this.