microsoft/axe-windows

[Feature Request] Enable automated scans of processes with multiple top-level windows

marcelwgn opened this issue ยท 27 comments

As we worked on adding automated Axe tests for Files we stumbled upon the fact that Files only has one process ID but it can spawn multiple windows. After some short testing, we noticed that the most recently focused window was the one to be scanned, a11y issues on the other windows were ignored. My question would be, if that is the expected behavior or if there are better ways to test in those cases. If this is something deeper in the UIA library, i.e. Windows, feel free to close this issue.

Hi @chingucoding! Thanks for reaching out letting us know about your use case.

axe-windows' automation APIs do currently make an assumption that the process being scanned will only have one "root element" in the UIA tree. Specifically, when you ask axe-windows to scan a specific process, its current behavior is to ask UIA to find the "first" descendent of the UIAutomation Root Element whose PID matches that of the target process:

root = UIAutomation.GetRootElement();
condition = UIAutomation.CreatePropertyCondition(PropertyType.UIA_ProcessIdPropertyId, pid);
var uia = root.FindFirst(TreeScope.TreeScope_Descendants, condition);
element = ElementFromUIAElement(uia);

The "choosing the most recently focused window" part is controlled by UI Automation's decision about which elements are "first", but the "only choosing a single window and not giving you a way to identify which one you want" part is a limitation of axe-windows; we would be happy to consider a feature request to improve this. We'd probably be open to a contribution for it, but it'll likely have API impact, so we'd definitely want to discuss it in an issue first.

If you have any specific ideas for solution(s) that would be most ideal for your use case, that would be very helpful. For example, would it be more useful to have:

  1. Something similar to the current APIs but which performs scans of each top level window for the process, giving back combined results across all of them, or
  2. An additional config option to allow axe-windows to scan one specifically chosen window?

If you have any specific criteria in mind you'd like axe-windows to be able to use to distinguish the different windows, that would also be helpful. We'd probably want to use something that's already a UIA Automation Element Property Identifier, and preferably something with values that would be stable across repeated scans (as opposed to something like UIA_NativeWindowHandlePropertyId that would vary across different CI builds/etc).

The team requires additional author feedback; please review their replies and update this issue accordingly. Thank you for contributing to Accessibility Insights!

Hi @dbjorge , thank you for your detailed reply!

I think I would definitely use a feature that would allow me to scan all windows of a given process. My ideal solution would be having a configuration option when creating a scanner to specify whether to either scan the "first" window or scan all windows. It could look something like this:

 // Create config to specifically target a process
 var myConfigBuilder = Config.Builder.ForProcessId(1234);

myConfigBuilder.WithMultiWindowStrategy(MultiWindowStrategy.LastFocused | MultiWindowStrategy.All)

 var myConfig = myConfigBuilder.build();
 var scanner = ScannerFactory.CreateScanner(myConfig);

The default option for this would be LastFocused to not break the previous behavior. If there was an option to filter to scan a specific window, I would probably only use it to scan all windows open by that app.

Also, if we decide on an API to use for this, I would be more than happy to contribute such a feature!

This issue has been marked as ready for team triage; we will triage it in our weekly review and update the issue. Thank you for contributing to Accessibility Insights!

@chingucoding , Sorry it's taken us a minute to get back to you. Thanks for your patience. The API we're considering at the moment is probably the simplest option: adding a method to the builder which takes a 0 based window index for the process (i.e., Config.Builder.WithWindowIndex(int index)). In this way, you would pass in the process id and the window index and the result returned would be for the specified top-level window. Would that solve the issue you experienced? And do you foresee any problems with that approach?

Thank you for your reply @RobGallo! This would be a good enough solution for me, however having the option to scan every window would also be appreciated. The only thing that could be a bit annoying is that one would need to create a new scanner in case a new window get's opened instead of being able to use the existing scanner. The current way I use Axe is to create one scanner for the test run and use that to scan every time I change the app. If I were to have multiple windows, I would then need to create a new scanner (or multiple scanners) to be able to scan the newly scanned windows. But I see that this is an acceptable trade of here. Have you (the Axe team) considered being able to set the window index to something special (e.g. -1) to be able to scan all windows instead of the specified one?

Edit: Now that I think of this, I think having this be part of the scanner creation actually makes more sense since this would be in line with the idea of having a scanner for a window instead of the whole process (which was not the case anyway).

This issue requires additional investigation by the Accessibility Insights team. When the issue is ready to be triaged again, we will update the issue with the investigation result and add "status: ready for triage". Thank you for contributing to Accessibility Insights!

@chingucoding
We considered scanning all the windows of a process, But we believe the window index solution is more doable in the short term because it wouldn't require so many major changes. Scanning multiple windows with one call would force us to rethink/rework how we handle file lifetimes, file names, and scan results, which are all areas currently designed around the assumption that one scan equals results for one window. We are also proceeding on the assumption that this is being used in test environments where window indexes are controlled and predictable.
If the window index solution works for you, and if you're still willing to work on the solution, please let me know and I can move the issue onto your plate.

The team requires additional author feedback; please review their replies and update this issue accordingly. Thank you for contributing to Accessibility Insights!

I see, yes that makes a lot sense @RobGallo . I would still be interested in implementing this in the current form. What would be the best way to move forward then? Should we discuss the edge case behavior here or should I rather create a PR and we discuss it there?

Also, on a slightly different note, would there be plans to change the file and screenshot behavior at some point to allow to scan multiple windows?

@chingucoding
I'm not sure which "edge case" you mean?

As for changing the file and results behavior, it's a possibility; it depends a lot on customer needs. As Yoda says, "Difficult to see. Always in motion is the future."

I'm not sure which "edge case" you mean?

Maybe this is not an edge case, but since we set the window index in the config, we have no way to verify if at the time of scanning that would make sense. So we would need to throw an exception when running the scan if the window index is not correct (i.e. to high).

Another thing I just noticed is how this could be done so the existing behavior is not being changed and don't break existing usage. Should not setting the Window Index be simply use the "first window" approach then? Also, do you happen to know how the indexing behaves when querying the list of UIA elements?

@chingucoding
We should throw an exception if this indexed window is unavailable . Please follow the examples of similar exceptions in the code.
Yes, if the window index is not specified, the expected behavior should be the existing behavior: the first window is scanned.

I don't believe I've ever seen documentation stating it, but in my experience, the call we use to retrieve the children is deterministic.

We should throw an exception if this indexed window is unavailable . Please follow the examples of similar exceptions in the code.
Yes, if the window index is not specified, the expected behavior should be the existing behavior: the first window is scanned.

Got it, thanks!

I don't believe I've ever seen documentation stating it, but in my experience, the call we use to retrieve the children is deterministic.

Would there be a good way to determine which window is the "first" window then? Should we sort them based on some criteria? Otherwise, if the order is something only UIA knows, I can imagine it can get very difficult to get the index right.

"Would there be a good way to determine which window is the 'first' window then? Should we sort them based on some criteria? Otherwise, if the order is something only UIA knows, I can imagine it can get very difficult to get the index right."

What types of information do you think you would need in order to determine which window you wanted to scan? We're thinking we might provide a way to return an array of window data so that users could see what order the windows are in. The data would come from UIA element properties and could include things like the window elements Name property and the hwnd (if the latter would be useful).

Note: we are expecting scanning will be performed in a test environment where the window order will be predictable. In other words, we're not thinking the window order will be random each time; the user can find the correct index once, then just set it in their test code.

What types of information do you think you would need in order to determine which window you wanted to scan? We're thinking we might provide a way to return an array of window data so that users could see what order the windows are in. The data would come from UIA element properties and could include things like the window elements Name property and the hwnd (if the latter would be useful).

I think the UIA information provided by the UIA tree would be enough here, having the UIA name of the window is even enough.

Note: we are expecting scanning will be performed in a test environment where the window order will be predictable. In other words, we're not thinking the window order will be random each time; the user can find the correct index once, then just set it in their test code.

Yes I understand this and this would be also the use case for me. However, what "worries" me is that I don't know in what order the windows would be when running the scan. We know that the first window is the last focused one. Does that mean they are sorted by last focused? This could potentially be an issue since depending on the order of tests, different windows could be spawned in different orders. If the test runner runs them in a different order than last time, the indices could be different making this approach a little bit harder.

@chingucoding
After my discussion with you, we have decided it would be most straight forward to scan all windows of a process. This change is more involved than what you and I were discussing earlier. You should see some PR's coming in the not-too-distant future to address this issue.

Sorry if it sounded like I wanted to have different features than initially discussed. Having a way to specify the window would be good enough and I also have the necessary changes locally (except tests). None the less, I understand your decision and would be more than happy to help contribute regardless what feature it should be.

@chingucoding
Thanks for your willingness to help! I may come back to you with an ask at some time during the process.

This issue requires additional investigation by the Accessibility Insights team. When the issue is ready to be triaged again, we will update the issue with the investigation result and add "status: ready for triage". Thank you for contributing to Accessibility Insights!

@RobGallo has a plan on this and is working on it

Hello, @chingucoding! We've completed our internal work on this and this is all ready for you to pick up if you're still interested in completing it. The general plan is as follows:

  1. Nothing should change if a process has only a single top-level window
  2. When multiple top-level windows exist, we want to scan them all and generate an a11ytest file for each one, even if the scans are clean (note that this the single-window case generates an a11ytest file only if issues are found)
  3. The a11ytest files should use the same base filename (specified or default), with a suffix indicating which file was involved (_1_of_2 and _2_of_2, for example).
  4. The automation layer should be appropriately updated and wired into the CLI.
  5. It would be ideal to have a simple test app with multiple top-level windows and an integration test to test it. This should follow the same pattern that we use with the existing integration tests.

Please consider using DaveTryon#35 as a starting point for your changes. That covers roughly points 1-3, although that branch isn't yet fully buildable since the results haven't yet been bubbled up the chain. That PR uses IReadOnlyCollection<ScanResults> on the assumption that we'll need the count upstream possibly without enumerating the items. If no such situation really exists, maybe IEnumerable<ScanResults> would make more sense.

If you decide that you can't pick this up, feel free to assign it back to me. I can't guarantee when it will get any traction, since we're actively working on some other things at the moment.

Awesome, thanks for the update, I'll start working on it using your branch. Let me know what the best way would be to incorporate the changes then. Should I create a new PR targeting main or should it target your branch? Of course we can coordinate that later once I'm (mostly) finished.

I think I'd just merge my branch into a branch in your fork, then work on it from there. When it's time to open the PR, just open it against the primary repo. The branch was just a convenient way to capture some ideas--there was never any intent that it would be used for anything more than that

Ah got it. I'll do that then and create a new PR against this repository.

This was added a while back, but we forgot to close this issue.