microsoft/accessibility-insights-service

[Bug] Scanning pages that use Server-Sent Events takes a minute longer than it should

dbjorge opened this issue · 3 comments

Describe the bug

The scan engine's page navigator includes functionality to attempt to wait for a page's outstanding network activity to settle before starting to scan it (in that it passes waitUntil: 'networkidle0' to Puppeteer's page.goto). However, when attempting to scan a page which uses Server-Sent Events (indefinitely-lived requests with content-type text/event-stream), the scan engine doesn't understand to special case this type of intentionally indefinite request. It assumes that it is a normal navigation-related request and blocks scanning until its (60s) navigation timeout elapses.

The motivating use case is an internal team attempting to run a scan via the Accessibility Insights for Azure DevOps task against a website in a developer environment based on an Express server with webpack-hot-middleware. This package is one of several HMR implementations which use Server-Sent Events to implement hot module reloading.

As a workaround, we've suggested that the impacted internal team disable HMR in their CI configuration running the Azure DevOps task, but we should still fix this on our end even if that customer is unblocked - running our scan CLI tool against a dev build of a website is a pretty reasonable use case.

Area

  • Install script
  • Update script
  • Scan CLI/Action
  • [?] Service (I expect the underlying issue would impact the service, too, but haven't verified this. I don't expect a user is very likely to end up running the service against an HMR-enabled page, but it's possible that a site being scanned could be using Server-Sent Events for non-HMR purposes)

To Reproduce

Steps to reproduce the behavior using webpack-hot-middleware's standard example project:

  1. git clone https://github.com/webpack-contrib/webpack-hot-middleware.git -b v2.25.1
  2. cd webpack-hot-middleware && npm clean-install
  3. cd example && npm clean-install
  4. npm start
  5. (separate prompt) mkdir ai-scan-tester && cd ai-scan-tester
  6. npm install accessibility-insights-scan@0.10.4
  7. npx ai-scan --crawl --url http://localhost:1616 --maxUrls 1

Expected behavior

Expected behavior: the single scan that this results in should only take as long as spinning up a puppeteer instance + minimal extra overhead (~5s)

Actual behavior: It instead takes "puppeeteer launch overhead + 60 seconds". Final request statistics: ... "requestAvgFinishedDurationMillis":62240 ...

Log files

Full console output from the repro steps above
❯ npx ai-scan --crawl --url http://localhost:1616 --maxUrls 1
INFO  PuppeteerCrawler:AutoscaledPool:Snapshotter: Setting max memory of this run to 3993 MB. Use the APIFY_MEMORY_MBYTES environment variable to override it.
INFO  PuppeteerCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}}
INFO  Launching Puppeteer {"args":["--disable-dev-shm-usage","--no-sandbox","--disable-setuid-sandbox","--js-flags=--max-old-space-size=8192","--no-sandbox","--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36"],"defaultViewport":{"width":1920,"height":1080,"deviceScaleFactor":1},"timeout":15000,"headless":true}
INFO  Statistics: PuppeteerCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":60037,"retryHistogram":[]}
INFO  PuppeteerCrawler:AutoscaledPool: state {"currentConcurrency":1,"desiredConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
Processing page http://localhost:1616/
Discovered 0 links on page http://localhost:1616/
Found 1 accessibility issues on page http://localhost:1616/
INFO  PuppeteerCrawler: Crawler reached the maxRequestsPerCrawl limit of 1 requests and will shut down soon. Requests that are in progress will be allowed to finish.
INFO  PuppeteerCrawler: Earlier, the crawler reached the maxRequestsPerCrawl limit of 1 requests and all requests that were in progress at that time have now finished. In total, the crawler processed 1 requests and will shut down.
INFO  PuppeteerCrawler: Final request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":62240,"requestsFinishedPerMinute":1,"requestsFailedPerMinute":0,"requestTotalDurationMillis":62240,"requestsTotal":1,"crawlerRuntimeMillis":62480,"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1]}
Generating summary scan report...
Warning: Each child in a list should have a unique "key" prop.

Check the top-level render call using <ul>. See https://fb.me/react-warning-keys for more information.
    in li
    in SummaryReportDetailsSection
    in div
    in main
    in ContentSection
    in body
    in BodySection
    in ReportBody
Summary report was saved as C:/temp/ai-scan-tester/ai_scan_cli_output/index.html

Screenshots

n/a

Additional context

Context about the customer that originally hit this issue can be found in an email thread with our internal support alias CC'd

This issue has been marked as ready for team triage; we will triage it in our weekly review and update the issue. Thank you for contributing to Accessibility Insights!

In case we want to do something ourselves that's more specific than networkidle0 or networkidle2, a good starting point is probably this gist linked from microsoft/playwright#2515, + a bit of logic to only track requests without Content-Type: text/event-stream - it's written against Playwright, but the implementation should be basically identical for Puppeteer.

Resolved in CLI 1.0.0