Yahoo title parsing improving
soxoj opened this issue · 1 comments
soxoj commented
I noticed that title of Yahoo is extracted incorrectly:
URL: https://gist.github.com/soxoj/9d65c2f4d3bec5dd25949197ea73cf3a
Title: gist.github.com › soxoj › 9d65c2f4d3bec5dd25949197eamaigret.ipynb · GitHub
Title should be maigret.ipynb · GitHub
I did some fixes form my other project here
tasos-py commented
Nice catch! I fixed it with bs4's .decompose()
, but I'll keep this open in case there is more work to be done