rust-scraper/scraper

How to get the number from NodeID

Closed this issue · 6 comments

  let selector = Selector::parse("title").unwrap();
  let node = docs.select(&selector).next();
  let element = node.unwrap();
  let p =element.id();

p is NodeId(5)

I want the number 5, what should i do?

The "5" is an internal implementation detail of the underlying tree data structure. Why do you need to access it?

The "5" is an internal implementation detail of the underlying tree data structure. Why do you need to access it?

Actually, I want to access the position of the HTML Document(or DOM Tree), such as <title></title> in the 37th position of the HTML DOM Tree

The underlying tree data structure is ego-tree which gives you:

  • random access to a NodeRef given a NodeId, e.g. Tree::get
  • sequential access to all nodes insertion/parser order, e.g. Tree::nodes
  • relative navigation from a node to its relatives, e.g. NodeRef::parent

How NodeId and NodeRef are implemented internally (i.e. that they are indices/references into a list of nodes) is an implementation detail which one normally does not depend on.

I guess there is a bit of a language barrier involved, but I think a straight-forward translation of

in the 37th position of the HTML DOM Tree

into code would be

let node = docs.tree.nodes().nth(37).unwrap();

but it is not clear to me how to end up with "37" in the first place.

From your example code, I would guess that you want to store p itself instead of "5" and turn it back into a NodeRef via Tree::get.

Base on our work purpose, we have decided to maintain a Node Tree ourselves, which can provide more stable support for our work and enable us to adjust our strategies more flexibly.

Thanks!

I am sorry to hear that as things tend to work out better if we collaborate on upstream projects. I still think there is an XY problem involved here as we have not yet reached a common understanding as to why "the position of the node within the DOM tree" is required in the first place.