kurtmckee/feedparser

Drop "sgmllib3k"?

buhtz opened this issue · 4 comments

buhtz commented

sgmllib3k = "^1.0.0"

Hi Kurt,
on my research in your repo I would say that you droped "sgmllib3k" support long ago. So this dependency in pyproject.toml can be removed, too?

It's still needed for parsing corrupt XML:

import sgmllib # type: ignore[import]

I've been working on migrating to lxml and Python's builtin html.parser in a very similar XML parsing project, kurtmckee/listparser. I'm hoping that my experience with that migration can translate to an update to feedparser's XML parsing, too.

buhtz commented

Good that I asked first.

I used "searched in that repo" (by GitHub) but this piece of code wasn't shown to me.

May I kindly ask to keep this bug report open until the problem is fixed? I am working on the feedparser package in the GNU Guix distribution, where building sgmllib3k currently fails; my impression is that it is incompatible with Python 3.10, as this happens during the check phase:

FAIL: test_declaration_junk_chars (test_sgmllib.SGMLParserTestCase)

Traceback (most recent call last):
File "/tmp/guix-build-python-sgmllib3k-1.0.0-1.7999646.drv-0/source/test_sgmllib.py", line 310, in test_declaration_junk_chars
self.check_parse_error("")
File "/tmp/guix-build-python-sgmllib3k-1.0.0-1.7999646.drv-0/source/test_sgmllib.py", line 127, in check_parse_error
parser.feed(source)
File "/tmp/guix-build-python-sgmllib3k-1.0.0-1.7999646.drv-0/source/sgmllib.py", line 98, in feed
self.goahead(0)
File "/tmp/guix-build-python-sgmllib3k-1.0.0-1.7999646.drv-0/source/sgmllib.py", line 168, in goahead
k = self.parse_declaration(i)
File "/gnu/store/i0d555a5fd7isi606aqqmbp5zgy9jh6p-python-3.10.7/lib/python3.10/_markupbase.py", line 134, in parse_declaration
raise AssertionError("unexpected %r char in declaration" % rawdata[j])
AssertionError: unexpected '$' char in declaration

So I have doubts that sgmllib3k (assuming we simply disabled its tests) would still parse corrupt XML...

Andreas

It install and works with feedparser on Python 3.10.