rust-scraper/scraper

Html and its children do not impl Send

supercoolspy opened this issue · 5 comments

Is there a reason to not use the send versions?

seconded, I'm running into issues because of this as well. I'm trying to do some sort of nested scraping (scrape one site, then scrape sites whose links were found on the original site), and Html not having the Send trait means I cannot do the second set of requests asynchronously.

Please try to enable the atomic Cargo feature, e.g.

scraper = { version = "0.19", features = ["atomic"] }

Please try to enable the atomic Cargo feature, e.g.

scraper = { version = "0.19", features = ["atomic"] }

Yeah, would have solved it for me (I just scoped the variables). Do you know if there is a performance implication with it?

For anybody else seeing this, using the atomic feature didn't fix it for me, but I also managed to get it running by making sure the async part doesn't contain any Html objects. In my case, I had to ensure that all HTTP requests were in a different scope than the scraping logic itself. In my initial version, I did both in a for-loop, which didn't work. @adamreichold @supercoolspy thanks for your help!

Do you know if there is a performance implication with it?

Yes, as the name suggests, it implies that the internal reference counting used by the tendril data structure uses atomic operations so these operations become thread-safe, i.e. the performance impact will be similar to replacing Rc by Arc.