More Can't get children, where plain text is before child
jtagcat opened this issue · 3 comments
Gonna go sleep, couldn't find duplicates within 1min, there might be.
Hello,
This program (taken from what you said was a failing example in the link you provided) prints baz
as expected:
func main() {
const data = `<div class="foo">Related: <span class="bar">baz</span></div>`
doc, err := goquery.NewDocumentFromReader(strings.NewReader(data))
if err != nil {
log.Fatal(err)
}
t := doc.Find(".foo > .bar").Text()
fmt.Println(t)
}
Please provide a complete runnable program that reproduces the issue (like I did here).
Thanks,
Martin
Ah yes, I'm scraping the same 3 pagetypes with iteration count over 10. Meanwhile I dropped down a level in abstractions.
Some wrapper library probably called .Children()
expecting to iterate over all children.
h := `<div class="info">
<h1>
<a href="#">junk</a>
Title text foo bar
</h1>
</div>`
doc, err := goquery.NewDocumentFromReader(strings.NewReader(h))
if err != nil {
panic(err)
}
s := doc.Find(".info > h1") // 3 children:
// FirstChild: Data: "\n "
// NextSibling: Data: "a"
// NextSibling: Data: "\n\n\n Title text foo bar\n "
c := s.Children() // only 2nd is available
println(len(c.Nodes)) // 1
s = doc.Find(".info") // 1 > 3 children
s = s.ChildrenFiltered("h1") // 3 children
c = s.Children() // 1 children, but PrevSibling and NextSibling are available (.Next(), .Last() don't work)
println(len(c.Nodes)) // 1
Yeah you're mixing looking at the raw html nodes and the goquery selections, those may differ and that's normal behaviour. The Children
family of functions returns a selection of matching elements (i.e. <div>
, <h1>
, etc.) and does not select non-element nodes like raw text (documented here: https://pkg.go.dev/github.com/PuerkitoBio/goquery#Selection.Children - "gets the child elements of each element"). What you seem to want is the Contents
family of functions, which also selects text and comment nodes: https://pkg.go.dev/github.com/PuerkitoBio/goquery#Selection.Contents.
Indeed, even if there are Prev/NextSibling on the selected node, it doesn't mean Next()
or Last()
will select them, as they work on elements and not raw text or comment nodes.
Hope this helps,
Martin