Issue with bs4

Question

Issue with bs4

Closed this issue 10 years ago · 4 comments

Using the scraper-in-browser to write a scraper for a plain, 100-row table for a workshop, and BeautifulSoup() in bs4 isn't soupifying the entire page. Old version of BeautifulSoup soupifies the page properly with scraper-in-browser.

Examples:

https://gist.github.com/danhillreports/6152491

from bs4 import BeautifulSoup

from BeautifulSoup import BeautifulSoup

Answer 1 · 2013-08-06T19:10:57.000Z

Is it this bug?

http://stackoverflow.com/questions/11650700/beautifulsoup-does-not-work-for-some-web-sites/11651200#11651200

If so, add this to the line that makes the soup:

 soup = BeautifulSoup(html.content, "html.parser")

And also if so, it has affected a couple of people, so I need to look at what version of Python/bs4/lxml we use... Help finding a bug reporter in either lxml or bs4 would be really useful!

Answer 2 · 2013-08-06T20:48:01.000Z

Looks like that's it! Thanks, I didn't run into that article before opening the issue.

Answer 3 · 2013-08-06T21:48:51.000Z

Leaving this open as it's affected two people now. If anyone can find the upstream bugs that'd be great!

Answer 4 · 2014-05-15T15:13:52.000Z

Don't think this is an issue any more.