Get a panic when parse html page
aaronchen2k opened this issue · 6 comments
Get a fatal panic when executing htmlquery.QueryAll on webpage from url https://baidu.com OR local file baidu.html as below script.
https://github.com/aaronchen2k/deeptest/blob/main/cmd/test/htmlquery_test.go
It works well if use a html string like:
https://github.com/aaronchen2k/deeptest/blob/main/internal/server/modules/v1/helper/mock/html.go
Thanks!
May be the http response is gzip
mode. you should decompress gzip before parsing .
May be the http response is
gzip
mode. you should decompress gzip before parsing .
In this test script test/htmlquery_test.go' , I read html from a local file, still cause a fatal panic.
Please help to check, thanks.
html := fileUtils.ReadFile("baidu.html")
The local baidu.html
file is good on my local test code.
test code below:
f, err := os.Open("./baidu.html")
if err != nil {
panic(err)
}
doc, err := htmlquery.Parse(f)
if err != nil {
panic(err)
}
// "//form[@id=1]/input[@id=\"kw\"]/@class" is invalid. changed to @id="1",
expression := `//form[@id="1"]/input[@id="kw"]/@class`
list, err := htmlquery.QueryAll(doc, expression)
if err != nil {
panic(err)
}
fmt.Println(len(list))
The local
baidu.html
file is good on my local test code.test code below:
f, err := os.Open("./baidu.html") if err != nil { panic(err) } doc, err := htmlquery.Parse(f) if err != nil { panic(err) } // "//form[@id=1]/input[@id=\"kw\"]/@class" is invalid. changed to @id="1", expression := `//form[@id="1"]/input[@id="kw"]/@class` list, err := htmlquery.QueryAll(doc, expression) if err != nil { panic(err) } fmt.Println(len(list))
Thank you for feedback!
I update the codes, now there is no error, but why the list always nil?
up
your query xpath is not correct. The local html file no any form with id=1
attribute. //form[@id="form"]/input[@id="kw"]/@class
. You can use chrome develop tool(Inspect
) or https://www.freeformatter.com/xpath-tester.html to test your xpath.