Yahoo title parsing improving

Question

soxoj opened this issue 3 years ago · 1 comments

I noticed that title of Yahoo is extracted incorrectly:

URL: https://gist.github.com/soxoj/9d65c2f4d3bec5dd25949197ea73cf3a
Title: gist.github.com › soxoj › 9d65c2f4d3bec5dd25949197eamaigret.ipynb · GitHub

Title should be maigret.ipynb · GitHub

I did some fixes form my other project here

Answer 1 · 2021-12-12T07:02:17.000Z

Nice catch! I fixed it with bs4's .decompose(), but I'll keep this open in case there is more work to be done