doc.Find("body").Contents().Length() is greater than zero even though <body> is empty
Closed this issue · 2 comments
Hello,
The following program:
package main
import (
"bytes"
"fmt"
"github.com/PuerkitoBio/goquery"
)
func numNodes(code string) {
r := bytes.NewReader([]byte(code))
doc, err := goquery.NewDocumentFromReader(r)
if err != nil {
panic(err)
}
fmt.Println("Length of selection =", doc.Find("body").Contents().Length())
}
func main() {
code1 := "<html><head></head><body></body></html>"
numNodes(code1)
code2 := "<html><head></head><body></body>\n</html>"
numNodes(code2)
}
produces this output:
Length of selection = 0
Length of selection = 1
Is it normal to get "Length of selection = 1" for the HTML code that is contained in the code2
variable? I expect to get "Length of selection = 0" because the <body>
tag is empty.
Hello,
It's normal due to how the html5 parser interprets this html - it moves the newline inside the <body>
so it is no longer empty. When you see weird things with the html, it's always a good idea to print the document after it has been parsed into goquery to see what goquery sees, e.g.:
doc, err := goquery.NewDocumentFromReader(r)
if err != nil {
panic(err)
}
fmt.Println(goquery.OuterHtml(doc.Selection))
You can see that in your second call, the actual document looks like this:
<html><head></head><body>
</body></html>
And since the Contents
method selects not only elements, but also comments and text nodes, it has selected the text node containing the newline.
Hope this helps,
Martin
Thank you very much for your explanation, mna!