htmlquery
is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.
htmlquery
built-in the query object caching feature based on LRU, this feature will caching the recently used XPATH query string. Enable query caching can avoid re-compile XPath expression each query.
go get github.com/antchfx/htmlquery
nodes, err := htmlquery.QueryAll(doc, "//a")
if err != nil {
panic(`not a valid XPath expression.`)
}
doc, err := htmlquery.LoadURL("http://example.com/")
filePath := "/home/user/sample.html"
doc, err := htmlquery.LoadDoc(filePath)
s := `<html>....</html>`
doc, err := htmlquery.Parse(strings.NewReader(s))
list := htmlquery.Find(doc, "//a")
list := range htmlquery.Find(doc, "//a[@href]")
list := range htmlquery.Find(doc, "//a/@href")
for n := range list{
fmt.Println(htmlquery.InnerText(n)) // output @href value without A element.
}
a := htmlquery.FindOne(doc, "//a[3]")
expr, _ := xpath.Compile("count(//img)")
v := expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(float64)
fmt.Printf("total count is %f", v)
Find
and QueryAll
both do the same things, searches all of matched html nodes.
The Find
will panics if you give an error XPath query, but QueryAll
will return an error for you.
Yes, you can. We offer the QuerySelector
and QuerySelectorAll
methods, It will accept your query expression object.
Cache a query expression object(or reused) will avoid re-compile XPath query expression, improve your query performance.
goos: windows
goarch: amd64
pkg: github.com/antchfx/htmlquery
BenchmarkSelectorCache-4 20000000 55.2 ns/op
BenchmarkDisableSelectorCache-4 500000 3162 ns/op
htmlquery.DisableSelectorCache = true
2019-11-19
- Add built-in query object cache feature, avoid re-compilation for the same query string. #16
- Added LoadDoc 18
2019-10-05
- Add new methods that compatible with invalid XPath expression error:
QueryAll
andQuery
. - Add
QuerySelector
andQuerySelectorAll
methods, supported reused your query object.
2019-02-04
- #7 Removed deprecated
FindEach()
andFindEachWithBreak()
methods.
2018-12-28
- Avoid adding duplicate elements to list for
Find()
method. #6
func main() {
doc, err := htmlquery.LoadURL("https://www.bing.com/search?q=golang")
if err != nil {
panic(err)
}
// Find all news item.
list, err := htmlquery.QueryAll(doc, "//ol/li")
if err != nil {
panic(err)
}
for i, n := range list {
a := htmlquery.FindOne(n, "//a")
fmt.Printf("%d %s(%s)\n", i, htmlquery.InnerText(a), htmlquery.SelectAttr(a, "href"))
}
}
Name | Description |
---|---|
htmlquery | XPath query package for the HTML document |
xmlquery | XPath query package for the XML document |
jsonquery | XPath query package for the JSON document |
Please let me know if you have any questions.