whatwg/html

Expose a stack of blocking elements

domenic opened this issue · 40 comments

Previously:

Based on these discussions, especially the last one, I'd like to propose the following API to explain some of the magic of <dialog>:

document.blockingElements.push(element); // note: if element is already in the blocking elements stack, it moves to the top
document.blockingElements.pop();
document.blockingElements.remove(element); // see https://github.com/whatwg/html/issues/897#issuecomment-198565716
document.blockingElements.top; // or .current or .peek()

We would generalize the "pending dialog stack" (which, note, is only used when showModal() is called, not just show()) to a "blocking element stack".

Dialogs would take part in that stack and be fully explained by it and integrated into it. For example:

  • Calling document.blockingElements.top after dialogEl.showModal() would return dialogEl.
  • Calling document.blockingElements.pop() with an active modal dialog would make that dialog un-modal (but not hide it or remove it from the document), and make whatever was below it on the stack become the new blocking element.

Ccing some people form the browser-accessibility-dev discussion: @marcysutton @alice @minorninth. Implementer interest would be especially helpful in moving this forward, @alice @minorninth.

It seems @cookiecrook might also have valuable input given he filed the bug.

alice commented

Very glad to see the conversation continuing on this, thanks for pushing it forward @domenic!

I like this concept in general. I'd like to dig a little deeper into what an element being on the "blocking element stack" means. Here are my guesses, basically copied from <dialog>'s behaviour:

  1. Being at the top of the stack effectively means everything else on the page is inert.
  2. Everything on the stack is in the top layer
    • (Aside: I'm intrigued by the possibilities of this, isolated from the other behaviours described here, as a solution to the problem of people messing with the DOM order to hack layering without using z-index, for things like floating top menu bars.)
  3. The top blocking element is a control group owner object.

Have I missed anything?

Also, would the order of the stack determine the visual layering order? If calling pop() doesn't hide the element, what does it do?

alice commented

Also, I'd like to ensure that we don't prematurely write off inert as a primitive which may be useful separately from this concept.

Thanks for working on the precision there @alice. You're right that there are several separate concepts. Currently (1) is handled by blocked by a modal dialog, which is set in showModal(). (2) is the only part that is directly tied to the definition of the pending dialog stack. In theory we could tease them apart, so that you can block all other elements on the page without being in the top layer, but in practice that's probably not so useful.

(3) is a little strange. See some previous discussion in #744. Currently the spec has all dialogs being control group owner objects. This is meant to cause tabbing to cycle if you're inside the dialog---even if the dialog is non-modal. However, Chrome does not currently do this (and only Chrome implements dialogs). And it's worth noting that for modal dialogs, this doesn't really matter, since everything else being inert means it's not focusable. So maybe we don't need to worry about (3), assuming we are going to do (1). (Which seems very likely, since that was the original ask.)

Also, would the order of the stack determine the visual layering order? If calling pop() doesn't hide the element, what does it do?

As long as you are in the stack, you are in the top layer, and the order of the top layer is indeed determined by the order of the stack. (Maybe there are issues here with fullscreen, though, that make it nontrivial to just map the ordering?) But once you are popped from the stack, you are no longer in the top layer. You will be below everything inside the top layer, but still visible, unless you change your visibility (e.g. with CSS).

alice commented

So maybe we don't need to worry about (3), assuming we are going to do (1). (Which seems very likely, since that was the original ask.)

Seems fair to me. I'm still struggling with what a non-modal dialog looks like (and what is a dialog group?)

In theory we could tease them apart, so that you can block all other elements on the page without being in the top layer, but in practice that's probably not so useful.
...
But once you are popped from the stack, you are no longer in the top layer. You will be below everything inside the top layer, but still visible, unless you change your visibility (e.g. with CSS).

Yep, this works for me too. I definitely think tying (1) and (2) together is a good idea - the visual cue is an intrinsic part of the behaviour.

Still thinking about naming, by the way, but I don't want to sidetrack into that discussion until we're all on the same page with the core concepts (not least because I think they will affect what nomenclature makes sense).

An example of a non-modal dialog is at https://jsbin.com/tojolefele/edit?html,output (Chrome only). Per spec, once you pop open the dialog and focus the button labeled "dialog 1", your tabbing should cycle between "dialog 1" and "dialog 2" and not break out to the "1" and "2" and "Run with JS" buttons. (Unless, presumably, you used something like Esc or other keyboard shortcuts to pop up outside the dialog's control group.)

Dialog groups appear to be a rather confusing concept meant to express "all dialogs in a control group", to handle cases like a Document with multiple open dialogs? Focus fixup rule two and three seem most relevant...

alice commented

@domenic If the non-modal dialog did trap focus, what would the difference be between that and a modal dialog? i.e. what is the purpose/function of a non-modal dialog, if it is indistinguishable from a bug?

You can still activate (and focus) things that are outside the non-modal dialog, e.g. with the mouse most obviously, but also with the keyboard assuming your UA allows you to escape a focus group using a key like Esc or ctrl + up arrow or similar. It only affects normal sequential focus navigation (i.e. the tab key).

alice commented

That seems like pretty weird behaviour, but I'm going to assume it makes sense for reasons I just don't currently understand.

To get back to the current topic, then:

  • open modal dialogs would automatically be on document.blockingElements, basically as if calling dialog.openModal() called document.blockingElements.push(dialog).
  • similarly, closing a modal dialog would effectively call document.blockingElements.remove(dialog) (assuming such an API exists)
  • what would be the effect of document.blockingElements.pop() if the top blocking element is a modal <dialog>? This seems ill defined currently - an open <dialog> must be part of the top layer, but pop() shouldn't automatically close it, as I understand it.

Open modal dialogs would automatically be on document.blockingElements, basically as if calling dialog.openModal() called document.blockingElements.push(dialog).

Yep

similarly, closing a modal dialog would effectively call document.blockingElements.remove(dialog) (assuming such an API exists)

Ah, interesting; it's not always the top, so such an API would be necessary to fully explain dialogs. We could add that... although we need to define what happens if you push the same element twice. I guess it gets moved to the top, instead of creating two entries.

what would be the effect of document.blockingElements.pop() if the top blocking element is a modal ? This seems ill defined currently - an open must be part of the top layer, but pop() shouldn't automatically close it, as I understand it.

It would remove it from the top layer, but it would still be visible, since you didn't actually change its visibility (e.g. with CSS).

If there is something else in the top layer then the dialog would be hidden underneath it.

It would remove it from the top layer, but it would still be visible, since you didn't actually change its visibility (e.g. with CSS).

In other words, it would transition the dialog to exactly the same state it would be in if you just called .show(), instead of .showModal().

alice commented

In other words, it would transition the dialog to exactly the same state it would be in if you just called .show(), instead of .showModal().

Ah - I'd missed that non-modal dialogs weren't in the top layer. Makes perfect sense in that case.

similarly, closing a modal dialog would effectively call document.blockingElements.remove(dialog) (assuming such an API exists)

Ah, interesting; it's not always the top, so such an API would be necessary to fully explain dialogs. We could add that... although we need to define what happens if you push the same element twice. I guess it gets moved to the top, instead of creating two entries.

Yep, I think that makes a lot of sense.

I think remove() also just makes sense if there is any case where a blocking element can become non-blocking while it's not the top blocking element.

Edited the OP to include .remove() and to note that when you .push() an element that is already in the stack, it moves to the top.

The idea seems okay to me. A few comments:

The spec should require UAs to perform an implicit pop() if the element in the stack is removed from the DOM view. IOW, if you set the top blocking element to display:none, it should no longer block.

The spec should be explicit about keyboard focus handling in the following areas.

  1. By default, the focus should transfer back to the element had focus prior to the new blocking element being pushed to the stack. For example, the focused "More Info" button triggers a blocking overlay. Once the overlay is dismissed, focus moves back to the More Info button.
  2. Authors should be able to provide optional Element References to be focused after an element is pushed onto or popped off of the stack. Perhaps as an optional parameter to the push and pop methods?
  3. It may be useful to consider the API as a focus stack manager rather than as a blocking element manager, or a blocking view stack manager where each object in the stack has properties for a) the blocking element and b) the currently focused descendent element within the stack item. There may be some cause to make these properties writable, too. For example, if the Cancel button was clicked in my Shopping Cart dialog, I would want the original "Cart" button to regain focus, but if the "Confirm Purchase" button was pressed, I may want a different element to gain focus. This could be achieved as an optional argument on pop() or by modifying the focused element property of the previous stack view.

@cookiecrook I don't really agree with any of these ideas. When you display: none a dialog, it does not get removed from the stack. We should not bake such behavior in for blockingElements. And, this proposal is explicitly not about focus management; it's about the top layer and about causing other elements to become inert. All the use cases you provide can already be accomplished with focus(), blur(), and activeElement. Certainly your (1) is true, and falls out automatically, but the features suggested in (2) and (3) do not really make sense as part of this proposal; use focus() to accomplish those.

@domenic wrote:

When you display: none a dialog, it does not get removed from the stack. We should not bake such behavior in for blockingElements.

Seems like an easy authoring mistake we could fix in the browsers. I could be convinced otherwise if there is a good negative case. (When would I ever want the top element in the stack to be both blocking an invisible?)

And, this proposal is explicitly not about focus management; it's about the top layer and about causing other elements to become inert.

When any element with focus becomes inert, focus is lost (inert includes a requirement to be non-focusable), so mainstream keyboard users and assistive technology users are kicked out of the DOM, resulting in a terrible user experience. Unfortunately this problem is all too common on the Web b/c HTML has historically underspecified focus behavior.

The issue of inert and blocking elements is inextricably joined to focus. Any proposal without focus considerations is incomplete.

That is a good point, @cookiecrook. If the top item on the stack is hidden with display: none, there is no reason for it to still be in the blocking elements stack–if you could avoid authoring mistakes like dropping focus, that would be most excellent. I seriously doubt developers would remember to handle focus in that case, since they don't do a great job of it now. Baking it into the API would help a lot.

Hmm, it seems people are misinterpreting the intent of this proposal. We're not interested in exposing a do-what-I-mean API here. We're trying to expose the top layer concept as a primitive. As such, we're definitely not going to couple it to things like focus or CSS visibility (which kind? visibility, display, left: -9999px, translate: -9999px?). It's the job of framework authors to use the lower-level tools we give them, such as CSS visibility + blocking elements stack + focus()/activeElement, in order to create magic DWIM APIs that behave in ways most appropriate for their applications.

Again, it might be helpful to review the background in https://groups.google.com/forum/#!msg/browser-accessibility-dev/QinGGM_OM7Y/FHpxY_qfBgAJ. I was originally in favor of not even having a stack, but there were concerns about how that might cause coordination problems. If having a stack opens us up to this kind of feature-creep, though, then we should back this proposal back to a simple document.blockingElement setter/getter.

alice commented

When you display: none a dialog, it does not get removed from the stack.

I experimented with this in http://output.jsbin.com/nupawi

It turns out that, as implemented at least, display: none on a <dialog> puts it into a weird state where the backdrop is removed but the rest of the page is still inert. Compare with the case where the <dialog> is made visibility: hidden (dialog is hidden but backdrop remains) and where the <dialog> is removed from the page (same effect as hide()).

So, to be consistent with <dialog>, we would not want to remove a display: none element from the stack.

However, I think <dialog> behaviour is arguably wrong here. It seems inconsistent to me: I think that making the <dialog> display: none should behave the same as one of the other two cases.

My feeling would be that it should behave the same as removing the dialog from the page, since I believe display: none is semantically very close to removing something from the DOM altogether (for example, it no longer has a layout object at all).

However, @robdodson made the point that we may want to start out by explaining the current <dialog> behaviour, and then potentially addressing this issue at a later stage.

(I also have some thoughts on the issue of moving/managing focus, but I'll write a separate comment for those.)

[Edit] Oh, I just had a thought: what happens when you set display: none and then back to display: unset (or whatever will set it back to its original value)? It seems like the logical behaviour would be somewhat magical: putting the element back in its previous position in the stack. Perhaps it does make sense to stick with the "broken" behaviour - I think it will be self-correcting after all, since it breaks behaviour for everyone.

(I definitely think we should, wherever possible, avoid creating a situation where it's possible to create behaviour which works well for mainstream users but broken for keyboard and/or assistive technology.).

Incidentally, I think that removing the stack concept is probably going to create more problems than it solves. For example, if adding a new blocking element is going to remove the previous blocking element from the top layer, it seems like that could cause all sorts of headaches.

Right, it's very intentional that display concerns like display: none, visibility: hidden, opacity: 0 left: -9999px, width: 0; height: 0; etc. are separate from inertness concerns. They can be varied independently as primitives to allow a variety of use cases.

alice commented

To be clear, I think display: none is a distinct case from all of the other cases you mention. As I said, display: none is very close to removing something from the DOM. visibility: hidden is also special, as it also effectively causes content to become inert (much like display: none). opacity: 0 and the rest affect only the visual presentation.

I don't really agree. In particular, I strongly disagree with the idea that display: none is like removing something from the DOM. Such an element is still part of the tree structure, participates in event dispatch, can become full screen or be .click()ed and I think even .focus()ed. Some aspects like hit testing do not apply to display: none, but neither do they apply to zero width/height elements (or, for that matter, offscreen elements). In any case, all these different modes are much more alike than they are different: they're all purely about CSS, with no impact on the DOM.

alice commented

It's not the case that an element with display: none or visibility: hidden can be successfully focus()ed. They also don't participate in the sequential focus order, and don't have an associated object in the accessibility tree. These distinctions do matter quite a bit in accessibility, because there are techniques that depend on them.

(Also, while it's true that these styles don't affect the DOM tree, in Blink terms they do affect the layout tree - display: none effectively prunes the layout tree from that point down. I suspect other browsers are implemented similarly.)

However, as you point out, they can be click()ed and do participate in event propagation. I made http://output.jsbin.com/biqobal to experiment with all of these - you'll need to watch the console to see what's going on.

On balance, I still think it probably does make sense to leave things in the stack when they are made display: none, but I could potentially be convinced otherwise.

alice commented

Regarding focus:

And, this proposal is explicitly not about focus management; it's about the top layer and about causing other elements to become inert.

When any element with focus becomes inert, focus is lost (inert includes a requirement to be non-focusable), so mainstream keyboard users and assistive technology users are kicked out of the DOM, resulting in a terrible user experience. Unfortunately this problem is all too common on the Web b/c HTML has historically underspecified focus behavior.

The issue of inert and blocking elements is inextricably joined to focus. Any proposal without focus considerations is incomplete.

I agree with all of this. I believe an explicit discussion of focus behaviour is well within the scope of this discussion, for the reasons James outlines.

I think a minimal baseline would be James' (1) proposal: when a new blocking layer is added, the blocked layer should maintain its focus state, which should be reinstated once it is no longer blocked.

This is a matter of context: the focused element provides a significant amount of context for keyboard and assistive technology users; not to reinstate focus when a blocking layer is removed means dropping that pre-existing context on the floor.

There is a precedent for this: if there is an active element in a page, and the user switches tabs away from the page, a blur event is sent on the active element, and then when the user switches back to the page the previously active is re-focused and a focus event fired.

[Edit: there is another precedent, which is that when a JavaScript alert() is shown, and then dismissed, focus returns to the originally focused element.]

I think (3) is a simply a generalisation of this idea. When any layer, including the base layer, is blocked, it should implicitly keep track of its active element, and re-focus that element when it is no longer blocked.

It then follows that yes, we will need a minimal amount of state around to make this happen - analogous to document.activeElement (3b in James' comment) for each blocking layer. Once again, there is a precedent for this: each shadow root has an activeElement. I'm not sure what we'd need to know the blocking element (3a in James' comment) for - the blocking element stack seems like sufficient information to compute that.

Choosing a different element to focus (2) is probably unnecessary to specify here: if the above mechanism were implemented, an author could add a capturing event listener for a focus event on the container element in question (either window for the base layer, or the element which was push()ed for a top layer), and move focus to the desired new element instead of the element which was about to be focused. Once again, this is currently possible when switching tab contexts (http://output.jsbin.com/kosuxu)

I support @cookiecrook and @alice in their assertion that we need to do more with focus. In particular, it seems inconsistent to handle focus from a tab cycling point of view but then refuse to handle more of the focus concerns.

I also think that the focus management should extend to non-modal keyboard commands that guarantee that a keyboard-only user will be able to navigate out of the dialog (group) into the next blocking layer, across into the browser chrome, back into the next blocking layer and back into the non-modal dialog.

Failure to do this will result in inconsistencies in implementations that will likely cause the dialog element to not be regarded as accessibility supported (as defined by WCAG 2) https://www.w3.org/TR/UNDERSTANDING-WCAG20/conformance.html#uc-accessibility-support-head (note that is not the normative definition, but conveys the spirit)

I'd really like to avoid this proposal growing to un-implementable, un-speccable size. As I've said already, @cookiecrook's (1) falls out automatically. If people have additional proposals for changing HTML's focus algorithms, I'd encourage them to open a new bug thread.

I'm going to back off my involvement in this thread as it seems that at this point we've exhausted discussion on the original proposal and are starting to go off-topic. Additionally, I've still yet to see any explicitly stated implementer interest (of the form "yes, $browser would implement this once it's specced") which is what this proposal needs to move forward---not more features. When such interest emerges from two implementers I'll be able to shift some of my resources back toward it.

@domenic wrote:

I'm going to back off my involvement in this thread as it seems that at this point we've exhausted discussion on the original proposal and are starting to go off-topic.

Your proposal's good. Don't abandon it. We're trying to help you make it better, not bigger.

@alice wrote:

I think a minimal baseline would be James' (1) proposal: when a new blocking layer is added, the blocked layer should maintain its focus state, which should be reinstated once it is no longer blocked.

I can let the other points rest if this one is picked up. Authors wanting to focus something else could capture the onblur or ondomfocusout events on the popped layer and explicitly set focus() elsewhere.

@domenic wrote:

As I've said already, @cookiecrook's (1) falls out automatically.

It's unrealistic to assume this will happen automatically. Focus is an inconsistent mess across all browsers because of the poor specification in HTML. Case in point: if the focused element is ever removed from the view, focus falls back to the body element (the nuclear option). Most mouse users are happily oblivious, but this behavior is only acceptable to people who don't use or understand focus.

@cookiecrook your last paragraph seems to make two points. 1) Focus is not well defined. It's not clear whose job it is to define focus but I guess HTML tries to do most of it so we should probably keep making improvements there. 2) The behavior that is defined isn't ideal for users that depend on focus navigation.

We can fix 1, especially if you help out with tests and maybe even PRs. Unfortunately we lack the necessary manpower to clean everything up at once, but I think we all agree user interface events could use some help.

It's unclear to me whether we can fix 2 without introducing new features, since focusing the body as backup has been the behavior for so long, it's likely sites depend on it.

I realize now another core purpose of my last paragraph was unclear. It's unrealistic to assume blocking stack focus will happen automatically because what will happen is what has always happened:

  • The focused element will be removed from the view.
  • Whether or not it's removed from the DOM is irrelevant; focus will be lost.
  • Focus will fall back to the body element, breaking the "stack" focus handling expectation that is consistent on every platform except the web platform.

Hopefully that's more clear. Any API (even a primitive) that can cause document focus to be lost—as this API can with regards to inertness—should take care to avoid common focus mistakes. In this case, an explicit one-liner to restore the previously "focused element" and the "sequential focus navigation starting point" in the stack may be sufficient.

@annevk wrote:

We can fix 1, especially if you help out with tests and maybe even PRs. Unfortunately we lack the necessary manpower to clean everything up at once, but I think we all agree user interface events could use some help.

Understood. I don't peruse the specs as often as I used to, but I will file bugs and/or PRs when I see them. On a related note: I just went looking for the only HTML spec focus bug I remembered offhand, and I'm pleased to see the following was added to the spec re: fragment navigation.

Move the sequential focus navigation starting point to target.
https://html.spec.whatwg.org/multipage/browsers.html#scroll-to-fragid

I remember arguing with Ian over this point on numerous occasions. Congrats to whomever made the edit or convinced another editor.

alice commented

Regarding this proposal: I think we actually have some consensus at this point between @domenic, @cookiecrook and I.

I spoke with Domenic offline to clarify his "falls out automatically" comment, and he pointed to https://html.spec.whatwg.org/multipage/interaction.html#focus-fixup-rule-three - "Let new focus target be the currently focused area of a top-level browsing context." etc. I think the source of the misunderstanding may have been in part that Chrome's implementation of <dialog> currently does not implement this behaviour.

Domenic and I also agreed that we can and should generalise this up the blocking element stack [edit: my reading of the "focus fixup" language is that this should already happen] - but without, at least immediately, exposing that state information to developers.

Thanks for the link. If this blocking element API will reference the "focus fix-up" rules, that may be sufficient. Ultimately I'd like to see more, but I agree we're in consensus for baseline focus support.

How would it interact with the fullscreen stack? Should fullscreen element also be put into this stack as well? If so, a stack containing fullscreen elements and modal dialogs sounds similiar to the existing top layer concept. Should these two share the same stack somehow?

I don't think it's related to the fullscreen stack.

Shouldn't fullscreen element block all other elements as well?

I see, I had forgotten the context. Yes, this is all tied in to the top layer concept. Probably it is best to manipulate that directly, but it does mean that using the fullscreen APIs could manipulate the stack out from under you.

Yeah, I think that could be an issue for dialog as well. Probably adding a flag to items of blocking element stack indicating how an element is added, and forbid script from removing an element from the stack if it is added by the Fullscreen API or dialog,showModal?

Elements in the Top Layer should be subject to the Top Layer's stacking context, in other words rendered on top of everything else - even if they're declared inside an element that creates new stacking context.

This would solve issues like this one http://jsbin.com/kuboqa/1/edit?html,output

<style>
  .backdrop {
    position: fixed;
    top: 0; bottom: 0;
    left: 0; right: 0;
    z-index: 1;
    background-color: rgba(0, 0, 0, 0.5);
  }
  .overlay {
    position: fixed;
    z-index: 2;
  }
</style>

<div class="backdrop"></div>

<!-- this creates a new stacking context -->
<div style="transform: translateZ(0);">
  <div class="overlay">
    <button>click me!</button>
  </div>
</div>

^ you cannot click the button in .overlay because stacking context

Yes, that is literally the definition of top layer.

Regarding multiple Top Layer elements opened, I wanted to provide the following use case (try it on chrome http://jsbin.com/bawecix/5/edit?html,output):

<style>
  body {
    height: 150vh;
  }
  dialog {
    width: 100px;
    overflow: hidden;
  }
</style>

<button onclick="myDialog.showModal()">Show modal</button>

<dialog id="myDialog">
  <img src="http://lorempixel.com/30/30/" title="Some random image from lorempixel">
  <select>
    <option>item</option>
    <option>item</option>
    <option>item</option>
  </select>
</dialog>

We have 3 overlays: <img> tooltip, <dialog> and <select>. All render their contents on the Top Layer.

Regarding the interactions, the last opened overlay is the first to be notified of events - e.g. node1, node2, then node3 listen for click event -> node3event listener is invoked first.

In the above example, only <dialog> and <select> take action on click/keydown events.

  • they trap focus within their content when opened
  • they close on Escape and stop that keydown event propagation (that is to say, only <select> closes when you press Escape)
  • they stop click event propagation (<select> closes on click)
  • <select> prevents scrolling from happening when it's opened, <dialog> doesn't

If I were to implement this, I'd have to implement my own LIFO event listener system to delegate events to the right overlay (here an example).