rust-scraper/scraper

Save source code `ElementRef::html()`

Closed this issue · 3 comments

The string obtained from html(), although equivalent for html markup, is not such in the case of a string.
Example:

<table cellspacing="1" border="0" width="100%" class="inf">
// content
</table>

and

<table border="0" class="inf" cellspacing="1" width="100%">
// content
</table>

These 2 lines are equivalent for HTML markup, but they are not suitable for the case of comparing strings (or hashes).
Is it possible to change this? Or is there some good option to do this (preferably without using regex)?

The internal TreeSink by which this crate integrates with the html5ever HTML does not transport the actual source text and/or positions, so we cannot reproduce them.

You can enable the deterministic feature of this crate to retain the order of the attributes, but this is not the only thing that could change compared to the source text, i.e. to ensure comparing/hashing strings yields stable results.

You can enable the deterministic feature of this crate

Thank you! I'll try to do that later.

but this is not the only thing that could change compared to the source text

Give me a hint. What else can change and what should be paid attention to.

Can be considered closed. The answer above worked for me.