zzzprojects/html-agility-pack

htmlDoc.DocumentNode.InnerText depends on new lines in HTML

d668 opened this issue · 5 comments

d668 commented

1. Description

htmlDoc.DocumentNode.InnerText gives inconsistent results whether there is a new line between HTML elements

see the fiddle. both outputs should be the same and should not depend whether there is new line in HTML markup

2. Exception

3. Fiddle or Project

https://dotnetfiddle.net/JOmlX0

4. Any further technical details

  • HAP version: 1.11.61
  • NET version .net 8

Hello @d668 ,

Thank you for reporting. However, I do not believe anything will be done now for this.

There is currently too much code to change/understand to make it work correctly for the time we can allow, as even Chrome and Firefox have different behaviors depending on whether there is some empty line between them or not.

The current InnerText in Chrome is: span1\n\np1\n\nspan1 span2\n\np2\n\nspan2

Notice that span1 and span2 are separated by a space while others have a new line. This case looks easy to handle, but it will require way more time to verify all InnerText rules that we currently don't have.

But indeed, HAP doesn't provide the same InnerText as a real browser.

Best Regards,

Jon

d668 commented

Notice that span1 and span2 are separated by a space while others have a new line.

you are right, so HAP is making two mistakes actually, making new line between span1 and span2 and not making new lines in span1p1span1. Bot Chrome and Firefox show it as

 span1

p1
span1 span2

p2
span2 

But indeed, HAP doesn't provide the same InnerText as a real browser.

Oh man and what then? not same but some? It really does look like you just don't have resources to fix it an obvious bug.

Hello @d668 ,

Feel free to propose a pull request with the fix ;)

We are currently reviewing/merging this week some other pull requests that have been submitted recently, so that would be a perfect time.

Best Regards,

Jon

d668 commented

If this is your excuse for not maintaining a project you started, that's lame. I am fine with beautifulsoup

d668 commented

man closing the issue with obvious bug?