html5lib/html5lib-python

consider making html5lib.tokenizer public

mgrandi opened this issue · 3 comments

Hello,

In version https://github.com/html5lib/html5lib-python/releases/tag/0.999999999 , html5lib.tokenizer was made private

The wpull project (https://github.com/ArchiveTeam/wpull ) uses this library, and if we were to ever migrate to using the 1.X versions, it would negatively impact the application, because instead of just tokenizing a webpage (see https://github.com/ArchiveTeam/wpull/blob/a4ff4a93f613ce18ad3c515aa3d4f5848a88b98c/wpull/document/htmlparse/html5lib_.py ), we would have to use the full tree parsing which is slower and uses more ram

is there any reason this was made private when the 1.x branch was released?

I don't understand what you mean with private. How can something be made private in Python?

This project seems abandoned....but by private I mean that obviously yes you can't make it private it actually in python, but I do mean that it changes location, and usually denoted with an underscore, and that means there is no guarantee that it will be in the same place / renamed / etc in future releases

Making it public = making it part of the public API so that way even if the underlying implementation changes , the API stays the same

Yeah, it's really annoying. If you are a maintainer of a project, then at least answer some issues, every month. Even if you don't want to write any new code.

I now understand what you mean. A simple solution would be to pin the version and then just use the undocumented, or to use your words, private part of the code.