harlan-zw/unlighthouse

Build Static Report - Relative Pathing

ErikCarlson-RGA opened this issue · 5 comments

Clear and concise description of the problem

When you do a --build-static, the pathing for the folders and files are mostly absolute referenced. In the root index.html that gets generated:

<script type="module" crossorigin src="/assets/index.12341234.js"></script>.

This becomes a problem if you want to create a central repository of multiple site scans in a singular hosted environment, S3 Website or even CloudFront -> S3 Bucket for example. The use case is we want to host a site where a CI can just s3 sync the scans to specific S3 folders and then we can browse accordingly. The site would have the following paths:

domain.com/site1-scan1
domain.com/site2-scan1
domain.com/site2-scan2

Expanding the bucket structure of domain.com/site1-scan1

s3://bucket/site1-scan1/assets/
s3://bucket/site1-scan1/index.html
s3://bucket/site1-scan1/ci-result.json
s3://bucket/site1-scan1/reports/

Because of the absolute references, the browser will always be requesting domain.com/assets/index.12341234.js when you go to domain.com/site1-scan1, which breaks because the assets folder doesn't exist at root.

Is it possible to have the paths generated relatively?

Suggested solution

Generate the paths of static builds as relative instead of absolute.

Alternative

No response

Additional context

No response

I had a similar use case and would be interested in this. For now Im working around this by manually replacing instances of "/assets" with the proper url prefixes for pointing at different folders on s3/cloudfront.

find .unlighthouse -type f | while read file; do
    sed -i 's|/assets|/<url-prefix-s3-folder>/assets|g' $file
done

I tested something similar and discovered it solves the issue, but also creates potential headaches. The fix replaces pathing on content of captured html if it posses /assets or /reports anywhere.

For example, our source domain might have origin.com/assets/vidoes/movie1.mp4. The reports that get generated, capture the html of the page. These are found in /reports/page-path/lighthouse.html. This lighthouse result page, now has the original captured html modified with the s3-folder unintentionally. Same if we have a news site or something origin.com/reports/all-the-base-belongs-to-us.

I can see a producer or product manager scanning the results and noticing the captured page has wrong pathing or text in the reports pages and start raising issues. Yes, I can modify the script to exclude certain files, but it just creates more debt to deal with later. Simplicity is favored.

Hey, I think you might want the routerPrefix option.

Well golly gee. I honestly didn't look too hard at the options file, since I was trying to make something templateable with as little custom files between CI's as possible.

I didn't see --router-prefix listed under the cli flag options page, but I tested just now to see if I could call it, and that worked. I also tested using the options file, which obviously worked as well.

Thank you for your response. My specific use case is fixed by this flag. Closing.

Great to hear, I'll try and make this easier to find in the documentation.