chromium/dom-distiller

WebTextTest.testGenerateOutputBRElements should have spec complaint test expectaion

yosinch opened this issue · 3 comments

Hi, I'm a Chrome developer working to make Element#innerText to the spec compliant[1
and this causes a test failure in WebTextTest.testGenerateOutputBRElements().

The root cause is this test case depends on current Chrome's spec violated behavior:
emitting newline for <br> in Node#textContent[2]. The spec says Node#innerText
returns simple concatenation of descendants Text nodes[3].

Note: Element#innerText returns Node#textContent for disconnected element and this test attempt to call Element#innerText for disconnected element.

Thus, I proposed to change testGenerateOutputBRElements() as below:

CURRENT got = text.generateOutput(true);
PROPOSED got = text.generateOutput(true).replace("\n", "");
Removing newline from got allows us this test works both broken innerText and spec compliant innerText
CURRENT want = "Words\nsplit\nwith\nlines";
PROPOSED want = "Wordssplitwithlines";
assertEquals(want, got);

Above assert is essentially:
<p>Words<br>split<br>with<br>lines</p>.textContent = Wordssplitwithlines

[1] https://html.spec.whatwg.org/multipage/dom.html#the-innertext-idl-attribute
[2] http://crbug.com/859410 Element#innerText for not being rendered element should not have newline for BR
[3] https://dom.spec.whatwg.org/#dom-node-textcontent

Hi, sorry for the late reply.

The WIP CL is this one, right?
https://chromium-review.googlesource.com/c/chromium/src/+/1114673

Fixing testFigureCaptionWithAnchor() should be easy, but generateOutput(true) is also used in quality evaluation. We have a corpus which is Google-only, with the "golden answer" in it. If we change the way innerText works, I think we might also need to update the corpus as well. However, we might not be able to obtain exactly the same score even if we do that, because some new lines would be gone.

Do you know if there's any way to emulate being rendered while not touching the DOM?

Sorry for later response.

The WIP CL is this one, right?
https://chromium-review.googlesource.com/c/chromium/src/+/1114673
Right.

Do you know if there's any way to emulate being rendered while not touching the DOM?
You can use Node#isConnected property to detect disconnect node and implment
your version of Node#textContent to emit newline for <br>.

BTW, I don't recommend to use disconnected node since Node#textContent return
collapsible whitespaces and ignoring text-transform.

Since this only affects WebText.generateOutput(textOnly=true), and it is only used in testing and evaluation, rendering the element before calling innerText seems to be a reasonable solution.