Incorrect expectations in tree-construction webkit02 tests
iabudiab opened this issue · 25 comments
Correct if I'm wrong. I could prepare a PR, but wanted to verify this first.
I believe the expectations for the following tests (15, 16, 17 in webkit02) to be incorrect:
<b><em><foo><foo><foo><aside></b></em>
<b><em><foo><foo><foo><foo><foo><foo><foo><foo><foo><foo><aside></b></em>
<b><em><foo><foob><foob><foob><foob><fooc><fooc><fooc><fooc><food><aside></b></em>
This <b><em><foo><foo><foo><aside></b></em> expects the following tree:
| <html>
| <head>
| <body>
| <b>
| <em>
| <foo>
| <foo>
| <foo>
| <aside>
| <b>
but should expect:
| <html>
| <head>
| <body>
| <b>
| <em>
| <foo>
| <foo>
| <foo>
| <aside>
| <b>
| <em>
The same is true for the other two tests, i.e. the <b> element, child of <aside>, should have an <em> after running the adoption agency algorithm.
Oh. Those tests. I remember them taking me quite a while to convince myself of their correctness. It looks like they're checking the interaction between the Noah's Ark clause (does <aside> introduce a marker?) and the AAA. I'm soon boarding a far-too-many-hours flight, hopefully I'll have convinced myself of their correctness or not by the end. :)
The em is removed because of "If inner loop counter is greater than three and node is in the list of active formatting elements, then remove node from the list of active formatting elements" combined with "If node is not in the list of active formatting elements, then remove node from the stack of open elements and then go back to the step labeled inner loop", no? (When the inner loop counter is 1, the bottommost foo is removed from the stack; when 2 the middle foo; when 3 the top foo.)
We have interop, and I'm pretty sure that's the key to working out what's going on.
I've doen some calculations myself.
Here is is the state when first entering the AAA with </b> token:
| Stack of Open Elements | Active Formatting Elements |
|---|---|
| html, body, b, em, foo, foo, foo, aside | b, em |
- In the outer loop we get the following:
| Formatting Element | Furthest Block | Common Ancestor | Bookmark |
|---|---|---|---|
| b | aside | body | 0 |
- Now the inner loop.
Blink and WebKit limit the inner loop counter to 3. This is also true for the html5lib-python implementation. However, the current spec doesn't state this explicitly, at least not explicitly enough as it is the case for the outer loop Outer loop: If outer loop counter is greater than or equal to eight, then abort these steps.
So in this case the <em> wouldn't be removed via the following step If inner loop counter is greater than three and node is in the list of active formatting elements, then remove node from the list of active formatting elements
So the state directly after the inner loop is as follows:
| Stack of Open Elements | Active Formatting Elements |
|---|---|
| html, body, b, em, aside | b, em |
and the DOM tree is untouched.
- Now the furthest block
<aside>gets inserted into the common ancestor<body> - A new element for the token, i.e.
<b>, is appended to<aside> <b>is removed from the active formatting elements and reinserted at bookmark, which is 0, so this stays as it is<b>is removed from the stack of open elements and is reinserted after the<aside>- The second iteration of the outer loops breaks because there is no furthest block and we end up with:
| Stack of Open Elements | Active Formatting Elements |
|---|---|
| html, body, em, aside | em |
with the following tree:
| <html>
| <head>
| <body>
| <b>
| <em>
| <foo>
| <foo>
| <foo>
| <aside>
| <b>
- The outer loop of the AAA for the
</em>token is almost the same:
| Formatting Element | Furthest Block | Common Ancestor | Bookmark |
|---|---|---|---|
| em | aside | body | 0 |
- The inner loop however is left immediately, since the element directly above the
<aside>in the stack of pen elements is the formatting element. - A new element for the token
<em>is created and it gets all child nodes from<aside>, as described:Take all of the child nodes of furthest block and append them to the element created in the last step. - The
<em>is then appended to<aside> - The second iteration of the outer loops breaks because there is no furthest block and we end up with:
| Stack of Open Elements | Active Formatting Elements |
|---|---|
| html, body, aside |
with the following tree:
| <html>
| <head>
| <body>
| <b>
| <em>
| <foo>
| <foo>
| <foo>
| <aside>
| <em>
| <b>
Notice the ordering of <em> and <b> under <aside>. My initial assumption was wrong, however the test expectation is still incorrect.
I have also tested the HTML snippet in Chrome, Safari & Firefox, all three produce this tree, i.e. aside, em, b.
As I see it, the spec is ambiguous when it comes to the inner loop handling.
And for the sake of completeness:
WebKit Adoption Agency
html5lib-python Adoption Agency
@RReverser or @zcorpan are probably much more able to address this than I am.
Just re-checked @iabudiab's steps per spec and in parse5 - test's expected value is wrong.
I don't see what in @iabudiab's explanation disproves what @gsnedders stated in #78 (comment).
To me by the time the algorithm reaches <em>, the inner loop counter already reached 3 and thus <em> gets removed.
FWIW, html5ever passes this test.
In Gecko I get this tree for http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4199
| <html>
| <head>
| <body>
| <b>
| <em>
| <foo>
| <foo>
| <foo>
| <aside>
| <b>
i.e. no em in the aside. But WebKit/Chromium/IE11 have aside, em, b.
cc @hsivonen
I feel like there was some change to the spec which caused fragmentation in results
From #78 (comment) (@iabudiab)
Blink and WebKit limit the inner loop counter to 3. This is also true for the html5lib-python implementation. However, the current spec doesn't state this explicitly, at least not explicitly enough as it is the case for the outer loop Outer loop: If outer loop counter is greater than or equal to eight, then abort these steps.
So in this case the
<em>wouldn't be removed via the following stepIf inner loop counter is greater than three and node is in the list of active formatting elements, then remove node from the list of active formatting elements
You're saying that it doesn't get reached in Blink and WebKit because they break out of the loop when the inner loop counter is three, and hence that never gets hit? Given the spec doesn't have that breakout from the loop, it should therefore get removed, no?
@inikulin whatwg/html@22ce3c31 is why WebKit/Blink/IE11/html5lib-python has that behaviour
As far as I can tell, @iabudiab's proposed behaviour is what would be expected by the spec prior to whatwg/html@22ce3c31. (And hence why it matches WebKit/Blink/IE11/html5lib-python.)
@gsnedders same in parse5
Edge matches IE11, FWIW.
So is there agreement that that was a good spec change, and we should be filing bugs on WebKit/Blink/Edge/html5lib-python/parse5?
Or do we want to align with the majority, revert the spec change, and file bugs on Gecko/html5ever?
It would be nice to know motivation behind whatwg/html@22ce3c3
@domenic html5lib-python just has a generic "update to match the spec" issue (it hasn't been properly updated in a long time) :)
@inikulin https://lists.w3.org/Archives/Public/public-whatwg-archive/2013Jul/0401.html
@inikulin https://lists.w3.org/Archives/Public/public-whatwg-archive/2013Jul/0006.html which that is a response to gives more of it
Notably, <b><i><a><s><tt><div></b>abc</b></div></tt></s></a>xyz</i> gives a non-sensical parse-tree (where xyz precedes abc) in the old spec/WebKit/Blink/Edge.
Notably,
<b><i><a><s><tt><div></b>abc</b></div></tt></s></a>xyz</i>gives a non-sensical parse-tree (where xyz precedes abc) in the old spec/WebKit/Blink/Edge.
Well, I don't see a problem. With foster parenting you will have similar results: <table><tr><td>123</td></tr><div>456</div></table>. And IMHO such cases are more common than the one with AAA.
Reordering in tables is something that all browsers did pre-HTML5. Reordering for misnested formatting elements is not.
Since it seems Gecko has successfully shipped the spec change, we should go ahead and implement it in the other browsers, too (unless we have data suggesting the new spec is less Web-compatible). The change in Gecko was https://bugzilla.mozilla.org/show_bug.cgi?id=901319
@travisleithead any update on this issue for Edge? re https://lists.w3.org/Archives/Public/public-html/2013Aug/0002.html
So we have consensus that the tests are right v. the spec, and impls should change? Closing as a result. (If that understanding is wrong, feel free to reopen it. Or ping me if you can't.)
Mmm. The 2013 bug is likely gone with our old bug database. I opened an issue on our public tracker--you can monitor progress there: https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/9854392/