Example code for crawling a JavaScript single-page app using Headless-Render-API's scrape endpoint.
Generates the following files:
- output/crawl-results.html with an HTML table of screenshots and meta tags per page
- output/crawl-results.json for the full data
- output/screenshots/*.png for each page screenshot
- output/_whitelist.js special file for https://github.com/sanfrancesco/prerendercloud-server, a JavaScript array of paths for pre-rendering
- _whitelist.js in the build/dist directory of your app configures which pages are enabled for lazy-load and crawl-on-boot pre-rendering
To use, clone and run npm start
git clone git@github.com:sanfrancesco/prerendercloud-crawler.git
cd prerendercloud-crawler
npm install
PRERENDER_TOKEN="" HOST_TO_SCRAPE=example.com npm start
Example console output if ran for headless-render-api.com will look something like:
$ PRERENDER_TOKEN="secretToken" HOST_TO_SCRAPE=headless-render-api.com npm start
scraping https://headless-render-api.com/
scraping https://headless-render-api.com/pricing
scraping https://headless-render-api.com/docs
scraping https://headless-render-api.com/support
scraping https://headless-render-api.com/blog
scraping https://headless-render-api.com/users/sign-in
scraping https://headless-render-api.com/docs/api/prerender
scraping https://headless-render-api.com/docs/api/examples
scraping https://headless-render-api.com/docs/api/usage
// skipping https://hub.docker.com/r/prerendercloud/webserver
scraping https://headless-render-api.com/users/sign-up
scraping https://headless-render-api.com/docs/api/screenshot-examples
scraping https://headless-render-api.com/docs/api/screenshot
Example output/crawl-results.html:
![image](https://private-user-images.githubusercontent.com/16573/244803752-98f2953b-fce2-491f-ac6b-ad1b4c30f340.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzMDk5MjAsIm5iZiI6MTcxODMwOTYyMCwicGF0aCI6Ii8xNjU3My8yNDQ4MDM3NTItOThmMjk1M2ItZmNlMi00OTFmLWFjNmItYWQxYjRjMzBmMzQwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjEzVDIwMTM0MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTY4NDQ4MzgwOGU3MTRjYTE1ZTljYzc0MmQzOTdhNGMwMzQzMWY5MmQ2MzEzZWYzYzMyZmQ3ZTY4Mjc2ZTQ3NjMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.nfYtBk37wxx_DoefVGBl51gpqQ7FyOzupUNpQbDwG68)