Support for `:has()` selector
124C41p opened this issue ยท 8 comments
Hi, do you plan to support the :has()
selector? To my understanding, this css keyword is needed for selecting objects based on the parent of another known object.
Consider the following example:
<div>
<div id="foo">
Hi There!
</div>
</div>
<ul>
<li>first</li>
<li>second</li>
<li>third</li>
</ul>
In order to select the second list item, I would like to use the following selector:
let selector = Selector::parse("div:has(div#foo) + ul > li:nth-child(2)").unwrap();
This line however panics as of scraper
version 0.18.1.
I think this is still missing support in our upstream selectors
dependency, at least in the version published on crates.io.
+1. I'm trying to scrape Wikipedia, which has this sort of nesting. For example:
<h2>
<span class="mw-headline" id="Registered_ports">Registered ports</span>
<!-- ... -->
</h2>
This selector: h2:has(#Registered_ports) ~ .wikitable.sortable
would pick the first table after this h2
, which is a good way to locate the content in lieu of a distinctive id/class on the table itself.
From what I can see selectors 0.25 (published to crates.io) does have :has
support. See https://docs.rs/selectors/latest/selectors/parser/enum.Component.html#variant.Has Although there seem to be performance improvements in more recent unreleased commits.
I had taken a look into adding :is()
support and it seems like both :is()
and :has()
are already supported by selectors
. The Parser
impl needs to enable support by implementing parse_is_and_where
and parse_has
.
fn parse_is_and_where(&self) -> bool {
true
}
fn parse_has(&self) -> bool {
true
}
@causal-agent Should it be safe to enable support for these selectors? I can make a PR with these changes unless these selectors are not enabled for a reason.
The Parser impl needs to enable support by implementing parse_is_and_where and parse_has.
Thank you for looking into this!
Should it be safe to enable support for these selectors? I can make a PR with these changes unless these selectors are not enabled for a reason.
I think only tests will answer that. Please open a PR, ideally including a test case. I can try to then also give it a spin in a code base containing a pretty diverse set of scrapers and see if anything breaks that is not caught by the tests here.
@jameshurst when your PR is ready, tag me. I will run some tests and review it ASAP.
I opened a PR addressing this, have a look