anaskhan96/soup

FindNextSibling bug

ShevaXu opened this issue · 4 comments

From the source code I see FindNextSibling calls r.Pointer.NextSibling.NextSibling which wrongly assumes NextSibling should have another NextSibling, and crash when it does not.

e.g.

const html = `<html>

  <head>
      <title>DOM Tutorial</title>
  </head>

  <body>
      <a>DOM Lesson one</a><p>Hello world!</p>
  </body>

</html>`

func main() {
	doc := soup.HTMLParse(html)
	link := doc.Find("a")
	next := link.FindNextSibling()
	fmt.Println(next.Text())
}

// $ panic: runtime error: invalid memory address or nil pointer dereference

This also applies for FindPrevSibling.

BTW, I suggest there should be FindNextSibling and FindNextSiblingElement as the spec describes. (This might be another issue, I guess what you want to implement is FindNextSiblingElement.)

Thanks for reporting it. I'll try to fix it as soon as possible.

I think the fix d325696 does not work because it is r.Pointer.NextSibling might be nil, and calling .NextSibling on it causing the nil pointer dereference crash/panic.

Maybe try this:

func (r Root) FindNextSibling() Node {
	nextSibling := r.Pointer.NextSibling
	if nextSibling == nil {
		log.Fatal("No next sibling found")
	}
	// try to find next element-node?
	if nextSibling.Type == html.ElementNode {
	       return Root{nextSibling}
	} else {
	       // keep looking
	}
}

Ah, I see it now. I'll make the fixes accordingly.

Fixed it in the above commit, and renamed them to FindNextElementSibling() and FindPrevElementSibling()
I'll be implementing the actual FindNextSibling() and FindPrevSibling() in the upcoming commits.