kurtmckee/feedparser

sgmllib3k which is one of the dependencies, is deprecated

gridhead opened this issue · 1 comments

On installing feedparser in a virtual environment, I ran into the versioning of one of the dependencies called sgmllib3k which I deduced to be strange (v1.0.0).

image

On searching for the said dependencies, I came across this page.

image

The author says to have made a "quick and dirty" port to Python 3 and states that it would no longer be maintained by them. With that, I have a couple of questions/concerns -

  1. Should the fact that sgmllib3k has not had a new version for 11 years and that their homepage http://hg.hardcoded.net/sgmllib gives a 404, be concerning to this project and its users?
  2. Are there any attempts made to make a shift towards a similar alternative library that is actively maintained and if so, is there a roadmap for the port to happen?
  1. sgmllib was part of the Python 2 standard library and it was extracted as the sgmllib3k module and put on pip during the transition to Python 3. It's not an ideal dependency anymore but it's not of immediate concern to me that it hasn't been updated.
  2. I've been investigating switching to another forgiving XML parser, such as lxml, but this implies an enourmous amount of work to fix unit tests, as lxml forgives broken XML in different ways than sgmllib.

It's a goal to move off of sgmllib, but I don't have months of free time like I did back in 2010 when I pored through thousands of unit test files by hand, so there's no roadmap for this migration, only a goal to get off of sgmllib. If you have a specific suggestion for an alternative package please let me know!