This Powershell script has the ability to retrieve and output all of a site's URIs by scraping its sitemap for links, followed by parsing HTML of those links, and gives an option to warm the site automatically or manually through generated curls.
- Starting with a site's parent sitemap, scrape for child sitemaps and get all links
- Output sitemaps and links as list and curls in individual files
- Choice whether to scrape links for URIs found in any tag-attribute combination you want. E.g.
<a href>, <img src>, <img srcset>, <img data-src>, <img data-srcset>, <link rel>, <script src>
- URIs are domain-specific (i.e. same domain as the sitemaps and links)
- Output each tag-attribute's URIs to files
- Output each tag-attribute's URIs as curls to files
- Choice whether to warm site with the above URIs as part of script
- A sitemap formatted in the Sitemap protocol format, populated with links
- Powershell v3
- Windows / *nix environment
- User with read/write/modify permissions on script directory
- Open the
config.ps1
in your favourite text editor and configure scripts settings - WinNT:
- Right click on the script in explorer and select
Run with Powershell
. (should be present on Windows 7 and up) - Alternatively, open command prompt in the script directory, and run
Powershell .\Scrape-Warm-Site.ps1
- Right click on the script in explorer and select
- *nix:
- Run
powershell ./Scrape-Warm-Site.ps1
orpwsh ./Scrape-Warm-Site.ps1
depending on which version of powershell you're running.
- Run
Q: Help! I am getting an error 'File C:...Scrape-Warm-Site.ps1 cannot be loaded because the execution of scripts is disabled on this system. Please see "get-help about_signing" for more details.'
- You need to allow the execution of unverified scripts. Open Powershell as administrator, type
Set-ExecutionPolicy Unrestricted -Force
and press ENTER. Try running the script again. You can easily restore the security setting back by usingSet-ExecutionPolicy Undefined -Force
.
Q: Help! Upon running the script I am getting an error File C:...Scrape-Warm-Site.ps1 cannot be loaded. The file
C:...\Scrape-Warm-Site.ps1 is not digitally signed. You cannot run
this script on the current system. For more information about running scripts and setting
execution policy, see about_Execution_Policies at http://go.microsoft.com/fwlink/?LinkID=135170.
- You need to allow the execution of unverified scripts. Open Powershell as administrator, type
Set-ExecutionPolicy Unrestricted -Force
and press ENTER. Try running the script again. You can easily restore the security setting back by usingSet-ExecutionPolicy Undefined -Force
.
Q: Help! Upon running the script I am getting a warning 'Execution Policy change. The execution policy helps protect you from scripts that you do not trust. Changing the execution policy might expose you to the security risks described in the about_Execution_Policies help topic at http://go.microsoft.com/?LinkID=135170. Do you want to change the execution policy?
- You need to allow the execution of unverified scripts. Type
Y
for yes and press enter. You can easily restore the security setting back opening Powershell as administrator, and using the codeSet-ExecutionPolicy Undefined -Force
.
Nil
Nil
- By default, script directory (where you run the script) needs read, execute, write permissions. All created files/folders will reside in the script directory.
- Website owners may want to warm their site (i.e. "preload the cache") from a remote client especially so if they use Content Delivery Networks (CDNs).
- Search Engine Optimization (SEO) typically involves optimizing a website's load times, and one of the most effective means of doing so is to preload or 'warm' the web cache. This script can be configured to do this automatically; alternatively site warming can be achieved through using the curls generated in separate files for portability.
- Website owners might want a list of links of all their resources (blog posts, media, etc.) if they intend to migrate their site (e.g. changing a domain name). This script can search for all of those and output them as a list.
- Website owners may simply need a list of their sitemaps, or links from those sitemaps.