18F/site-scanning

Lazy-Loading Sites w/ Incomplete Data

Closed this issue · 1 comments

This seems somewhat relevant to #856 but wanted to highlight a specific issue that I noticed when searching through the results (awesome tool btw)!

Goal:

Support scanning of sites that render content after initial page load. When looking through the site-scanning results for one of the sites I manage, I noticed the USWDS score was much lower than expected. Upon further investigation, I found that the initial response from the webserver was an abbreviated set of HTML and javascript files which in turn, rendered additional content. I created a gist that shows the initial server response and the fully rendered page (a difference of ~10k lines).

The labs.gsa.gov/s site by my estimation should have a USWDS score of:

eval criteria score site-scan code ref gist ref
.usa classes 40 https://github.com/18F/domain-scan/blob/master/scanners/uswds2.py#L36 multiple
Source Sans 5 https://github.com/18F/domain-scan/blob/master/scanners/uswds2.py#L100 https://gist.github.com/mvogelgesang/94b137577c44f7e3fbf2fd9e4dd65c53#file-after-page-load-html-L9402
uswds in css body 20 https://github.com/18F/domain-scan/blob/master/scanners/uswds2.py#L115 multiple
uswds version in body 20 https://github.com/18F/domain-scan/blob/master/scanners/uswds2.py#L125 https://gist.github.com/mvogelgesang/94b137577c44f7e3fbf2fd9e4dd65c53#file-after-page-load-html-L9166
TOTAL 85

Tasks:

  1. Determine best way to identify sites who's initial response results in additional javascript (or other actions) that dynamically render the page.
  2. Allow for page rendering and then perform scan activities- specifically for USWDS and theme criteria

Acceptance Criteria:

  • A lazy-loading page would return appropriate score resulting from a fully-loaded page

Moving this issue over to GSA/site-scanning#35