consider making html5lib.tokenizer public
mgrandi opened this issue · 3 comments
Hello,
In version https://github.com/html5lib/html5lib-python/releases/tag/0.999999999 , html5lib.tokenizer was made private
The wpull project (https://github.com/ArchiveTeam/wpull ) uses this library, and if we were to ever migrate to using the 1.X versions, it would negatively impact the application, because instead of just tokenizing a webpage (see https://github.com/ArchiveTeam/wpull/blob/a4ff4a93f613ce18ad3c515aa3d4f5848a88b98c/wpull/document/htmlparse/html5lib_.py ), we would have to use the full tree parsing which is slower and uses more ram
is there any reason this was made private when the 1.x branch was released?
I don't understand what you mean with private. How can something be made private in Python?
This project seems abandoned....but by private I mean that obviously yes you can't make it private it actually in python, but I do mean that it changes location, and usually denoted with an underscore, and that means there is no guarantee that it will be in the same place / renamed / etc in future releases
Making it public = making it part of the public API so that way even if the underlying implementation changes , the API stays the same
Yeah, it's really annoying. If you are a maintainer of a project, then at least answer some issues, every month. Even if you don't want to write any new code.
I now understand what you mean. A simple solution would be to pin the version and then just use the undocumented, or to use your words, private part of the code.